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Describing tine collective activity of neural populations is a daunting task. Recent empirical 
studies in retina, however, suggest a vast simplification in how multi-neuron spiking 
occurs: the activity patterns of retinal ganglion cell (RGC) populations under some 
conditions are nearly completely captured by pairwise interactions among neurons. 
In other circumstances, higher-order statistics are required and appear to be shaped 
by input statistics and intrinsic circuit mechanisms. Here, we study the emergence 
of higher-order interactions in a model of the RGC circuit in which correlations are 
generated by common input. We quantify the impact of higher-order interactions by 
comparing the responses of mechanistic circuit models vs. "null" descriptions in which all 
higher-than-pairwise correlations have been accounted for by lower order statistics; these 
are known as pairwise maximum entropy (PME) models. We find that over a broad range 
of stimuli, output spiking patterns are surprisingly well captured by the pairwise model. To 
understand this finding, we study an analytically tractable simplification of the RGC model. 
We find that in the simplified model, bimodal input signals produce larger deviations from 
pairwise predictions than unimodal inputs. The characteristic light filtering properties of 
the upstream RGC circuitry suppress bimodality in light stimuli, thus removing a powerful 
source of higher-order interactions. This provides a novel explanation for the surprising 
empirical success of pairwise models. 

Keywords: retinal ganglion cells, maximum entropy distribution, stimulus-driven, correlations, computational 
model 



1. INTRODUCTION 

Information in neural circuits is often encoded in the activity of 
large, highly interconnected neural populations. The combina- 
toric explosion of possible responses of such circuits poses major 
conceptual, experimental, and computational challenges. How 
much of this potential complexity is realized? What do statistical 
regularities in population responses tell us about circuit architec- 
ture? Can simple circuit models with limited interactions among 
cells capture the relevant information content? These questions 
are central to our understanding of neural coding and decoding. 

Two developments have advanced studies of synchronous 
activity in recent years. First, new experimental techniques pro- 
vide access to responses from the large groups of neurons neces- 
sary to adequately sample synchronous activity patterns (Baudry 
and Taketani, 2006). Second, maximum entropy approaches 
from statistical physics have provided a powerful approach to 
distinguish genuine higher-order synchrony (correlations) from 
that explainable by pairwise statistical interactions among neu- 
rons (Martignon et al., 2000; Amari, 2001; Schneidman et al., 
2003). These approaches have produced diverse findings. In 
some instances, activity of neural populations is extremely 
well described by pairwise interactions alone, so that pairwise 
maximum entropy (PME) models provide a nearly complete 
description (Shlens et al., 2006, 2009). In other cases, while 
pairwise models bring major improvements over independent 



descriptions, it is not clear that they fuUy capture the data 
(Martignon et al, 2000; Schneidman et al, 2006; Tang et al, 2008; 
Yu et al, 2008; Montani et al, 2009; Ohiorhenuan et al, 2010; 
Santos et al., 2010). Empirical studies indicate that pairwise mod- 
els can fail to explain the responses of spatially localized triplets 
of cells (Ohiorhenuan et al., 2010; Ganmor et al., 2011), as well 
as the activity of populations of ~ 100 cells responding to natural 
stimuli (Ganmor et al., 2011). Overall, the diversity of empirical 
results highlights the need to understand the network and input 
features that control the statistical complexity of synchronous 
activity patterns. 

Several themes have emerged from efforts to link the corre- 
lation structure of spiking activity to circuit mechanisms using 
both abstract (Amari et al, 2003; Krumin and Shoham, 2009; 
Macke et al, 2009; Roudi et al, 2009a) and biologically-based 
models (Bohte et al, 2000; Martignon et al., 2000; Roudi et al, 
2009b); these models, however, do not provide a full description 
for why the PME models succeed or fail to capture neural cir- 
cuit dynamics. First, thresholding non-linearities in circuits with 
Gaussian input signals can generate correlations that cannot be 
explained by pairwise statistics (Amari et al., 2003); the deviations 
from pairwise predictions are modest at moderate population 
sizes (Macke et al., 2009), but may become severe as population 
size grows large (Amari et al., 2003; Macke et al, 2011). The pair- 
wise model also fails in networks of recurrent integrate-and-fire 



Frontiers in Computational Neuroscience 



www.frontlersin.org 



February 2014 | Volume 8 | Article 10 | 1 



Barreiro et al. 



Beyond-pairwise correlations in microcircuits 



units with adapting thresholds and refractory potassium currents 
(Bohte et al., 2000). The same is true for "Boltzmann-type" net- 
works with hidden units (Koster et al., 20 1 3). Finally, small groups 
of model neurons that perform logical operations can be shown 
to generate higher-order interactions by introducing noisy pro- 
cesses with synergistic effects (Schneidman et al., 2003), but it is 
unclear what neural mechanisms might produce similar distri- 
butions. These diverse findings point to the important role that 
circuit features and mechanisms — input statistics, input/output 
relationships, and circuit connectivity — can play in regulating 
higher-order interactions. Nevertheless, we lack a systematic 
understanding that links these features and their combinations to 
the success and failure of pairwise statistical models. 

A second theme that has emerged is the use of pertur- 
bation approaches to explain why maximum entropy models 
with purely pairwise interactions capture circuit behavior in the 
limit in which the population firing rate is very low (i.e., the 
total number of firing events from all cells in the same small 
time window is small) (Cocco et al, 2009; Roudi et al., 2009a; 
Tkacik et al., 2009). Also in this regime, higher-order inter- 
actions cannot be introduced as an artifact of under-sampling 
the network (Tkacik et al, 2009), a concern at higher popu- 
lation firing rates. However, the low to moderate population 
firing rates observed in many studies permit a priori a fairly 
broad range in the quality of pairwise fits. What is left to explain 
then is why circuits operating outside the low population fir- 
ing rate regime often produce fits consistent with the PME 
model. 

We approach this issue here by systematically characterizing 
the ability of PME models to capture the responses of a class 
of circuit models with the following defining features. First, we 
consider relatively small circuits of 3-16 cells, each with iden- 
tical intrinsic dynamics (i.e., spike-generating mechanism and 
level of excitability). Second, we assume a particular structure for 
inputs across the circuit. Each neuron receives the same global 
input which, for example, represents stimuli in the receptive 
fields of all modeled cells. Neurons also receive an independent, 
Gaussian-like noise term. Third, the circuit has either no recipro- 
cal coupling, or has aU-to-aU excitatory or gap junction coupling. 
We begin with circuit models fully constrained by measured 
properties of primate ON parasol ganglion networks, receiving 
full-field and checkerboard light inputs. We then explore a sim- 
ple thresholding model for which we exhaustively search over the 
entire parameter space. 

We identify general principles that describe higher-order spike 
correlations in the circuits we study. First, in all cases we exam- 
ined, the overall strength of higher-order correlations are con- 
strained to be far lower than the statistically possible limits. 
Second, for the higher-order correlations that do occur, the pri- 
mary factor that determines how significant they will be is the 
bimodal vs. unimodal profile of the common input signal. A sec- 
ondary factor is the strength of recurrent coupling, which has a 
non-monotonic impact on higher-order correlations. Our find- 
ings provide insight into why some previously measured activity 
patterns are well captured by PME descriptions, and provide pre- 
dictions for the mechanisms that allow for higher-order spike 
correlations to emerge. 



2. RESULTS 

2.1. QUANTIFYING HIGHER-ORDER CORRELATIONS IN NEURAL 
CIRCUITS 

One strategy to identify higher-order interactions is to com- 
pare multi-neuron spike data against a description in which 
any higher-order interactions have been removed in a principled 
way — that is, a description in which all higher-order correlations 
are completely described by lower-order statistics. Such a descrip- 
tion may be given by a maximum entropy model (Jaynes, 1957a,b; 
Amari, 2001), in which one identifies the most unstructured, or 
maximum entropy, distribution consistent with the constraints. 
Comparing the predicted and measured probabilities of differ- 
ent responses tests whether the constraints used are sufficient 
to explain observed network activity, or whether additional con- 
straints need to be considered. Such constraints would produce 
additional structure in the predicted response distribution, and 
hence lower the entropy. 

A common approach is to limit the constraints to a given sta- 
tistical order — for example, to consider only the first and second 
moments of the distributions, which are determined by the mean 
and pairwise interactions. In the context of spiking neurons, we 
denote |X/ = E[x,] as the firing rate of neuron i and py = E[x,Xj] 
as the joint probability that neurons i and j will fire. The distribu- 
tion with the largest entropy for a given |x,- and is referred to as 
the PME model. 

We use the KuUback-Leibler divergence, DklC^", F), to quan- 
tify the accuracy of the PME approximation P to a distribution 
P. This measure has a natural interpretation as the contribution 
of higher-order interactions to the response entropy S(P) (Amari, 
2001; Schneidman et al., 2003), and may in this context be written 
as the difference of entropies S(P) — S(P). In addition, Dyj^{P, F) 
is approximately — log2 L, where L is the average likelihood (over 
different observations) that a sequence of data drawn from the 
distribution P was instead drawn from the model P (Cover and 
Thomas, 1991; Shlens et al., 2006). For example, if Dkl(P, f") = 1, 
the average likelihood that a single sample, i.e., a single network 
response, came from P relative to the likelihood that it came from 
P is 2^^ (we use the base 2 logarithm in our definition of the 
KuUback-Leibler divergence, so all numerical values are in units 
ofbits). 

An alternative measure of the quality of the pairwise model 
comes from normalizing Dkl (P, P) by the corresponding distance 
of the distribution P from an independent maximum entropy fit 
■Dkl(^', Pi)j where Pi is the highest entropy distribution consis- 
tent with the mean firing rates of the cells (equivalently, the prod- 
uct of single-ceE marginal firing probabilities) (Amari, 2001). 
Many studies (Schneidman et al, 2006; Shlens et al, 2006, 2009; 
Roudi et al, 2009a) use 

Okl {P, P) 

A = l ^ ; (1) 

Okl(P,Pi) 

a value of A = 1 indicates that the pairwise model perfectly 
captures the additional information left out of the independent 
model, while a value of A = 0 indicates that the pairwise model 
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gives no improvement over the independent model. To aid com- 
parison with other studies, we report values of A in parallel with 
Dkl{P, P) when appropriate. 

We next explore and interpret the achievable range of 
Dkl(P, P) values. The problem is made simpler if, following pre- 
vious studies (Bohte et al, 2000; Amari, 2001; Macke et al, 2009; 
Montani et al., 2009), we consider only permutation-symmetric 
spiking patterns, in which the firing rate and correlation do not 
depend on the identity of the cells; i.e., |x,- = |x, p,j = p for i / j. 
We start with three cells having binary responses and assume 
that the response is stationary and uncorrelated in time. From 
symmetry, the possible network responses are 



po = P[(0,0,Q)] 

pi=P [(1, 0, 0)] = P 1(0, 1, 0)] = P 1(0, 0, 1)] 
P2 = 1, 0)] = P [(1, 0, 1)] = P [(0, 1, 1)] 

P3 = i' [(1,1,1)], 

where p,- denotes the probability that a particular set of i cells spike 
and the remaining 3 — i do not. Possible values of (po, pi, p2, Ps) 
are constrained by the fact that P is a probability distribution, so 
that the sum ofp, over all eight states is one. 

To assess the numerical significance of Dxl(P, P), we can com- 
pare it with the maximal achievable value for any symmetric 
distribution on three spiking cells. For three cells, the maxi- 
mal value is Dkl(P, P) = I (or 1/3 bits per neuron), achieved 
by the XOR operation (Schneidman et al., 2003). This distri- 
bution is illustrated in Figure 1 A (right), together with two 



distributions produced by our mechanistic circuit models — 
illustrating observed deviations from PME fits for unimodal (left) 
and bimodal (middle) distributions of inputs (see below). The 
i<X- divergence for these two patterns is 0.0013 and 0.091, respec- 
tively. As suggested by these bar plots (and explored in detail 
below), the distributions produced by a wide set of mechanistic 
circuit models are quite well captured by the PME approximation: 
to use the likelihood interpretation described above, an observer 
would need to draw many more samples from these distributions 
in order to distinguish between the true and model distributions: 
~1000 times and ~10 times, respectively, in comparison to the 
XOR operator. 

To further identify appropriate "benchmark" values of 
Dkl(P, P) with which to compare our mechanistic circuit mod- 
els, in Figure IB we show plots of Dkl(P, P) and A vs. firing rate 
produced by an exhaustive sampling of symmetric distributions 
on three cells. From this picture, we can see that it is possible to 
find symmetric, three-cell spiking distributions that are poorly 
fit by the pairwise model at a range of firing rates and pairwise 
correlations, with the largest values of Dkl (P, P) found at low cor- 
relations (note that the XOR distribution has an average pairwise 
covariance of zero (i.e., E[XiX2] = E[Xi] E[X2])). 

2. 1. 1. A condition for higlier-order correlations 

Possible solutions to the symmetric PME problem take the form 
of exponential functions characterized by two parameters, Xi and 
^2, which serve as Lagrange multipliers for the constraints: 

1 



P [(Xl,X2, Xi)] = - exp [Xl (Xl +X2+ X3) + 
X2 (X1X2 +X2X3 +XlXi)] . 



(2) 
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FIGURE 1 I A survey of the quality of the pairwise maximum entropy 
(PIVIE) model for symmetric spil<ing distributions on three cells. (A) 

Probabiiity distribution P (dark biue) and pairwise approximation P (iiglit 
pink) for three example distributions. From left to right: an example from 
the simple sum-and-threshold model receiving skewed common input; an 
example from the sum-and-threshold model receiving bimodal common 
input [specifically, the distribution with maximal Dkl(P, P)1; a specific 
probability distribution resulting from application of the XOR operator [for 



illustration of a "worst case" fit of the PME model (Schneidman et a!., 
2003)1. (B) Dkl(P, ~P) vs. firing rate and A vs. firing rate, for a 
comprehensive survey of possible symmetric spiking distributions on three 
cells (see text for details). Firing rate is defined as the probability of a 
spike occurring per cell per random draw of the sum-and-threshold model, 
as defined in Equation (16). Color indicates output correlation coefficient p 
ranging from black for p e (0, 0.1), to white for p € (0.9, 1), as illustrated in 
the color bars. 
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The factor Z normalizes _P to be a probability distribution. 

By combining individual probabilities of events as given by 
Equation (2) the following relationship must be satisfied by any 
symmetric PME solution: 



p3 



(3) 



This is equivalent to the condition that the strain measure of 
Ohiorhenuan and Victor (2010) be zero (in particular, the strain 
is negative whenever P3/P0 — (pi/pi)^ < 0, a condition identified 
in Ohiorhenuan and Victor (2010) as corresponding to sparsity in 
the neural code). 

For three-cell, symmetric networks, models that exactly satisfy 
Equation (3) will also be exactly described via PME. Moreover, 
note that probability models that meet this constraint fall on 
a surface in the space of (normalized) histograms, given by 
the probabilities pj. One can verify by straightforward calcula- 
tions (see Appendix) that — given fixed lower order moments — 
Dkl{P, P) is a convex function of the probabilities pj. This has 
interesting consequences for predicting when large vs. small val- 
ues of DklC^", P) will be found (see Appendix). 

It is not necessary to assume permutation symmetry when 
deriving the PME fit P to an observed distribution P, or 
in computing derived quantities such as D^i{P,P), and we 
do not do so in this study. However, most of the distri- 
butions we study are derived from mechanistic models that 
are themselves symmetric or near-symmetric. Therefore, we 
anticipate that the simplified calculations for permutation- 
symmetric distributions will yield analytical insight into our 
findings. 

2.2. MECHANISMS THAT IMPACT BEYOND-PAIRWISE CORRELATIONS 
IN TRIPLETS OF ON-PARASOL RETINAL GANGLION CELLS 

Having established the range of beyond-pairwise correlations that 
are possible statistically, we turn our focus to coding in retinal 
ganglion cell (RGC) populations, an area that has received a great 
deal of attention empirically. Specifically, PME approaches have 
been effective in capturing the activity of small RGC popula- 
tions (Schneidman et al, 2006; Shlens et al, 2006, 2009). This 
success does not have an obvious anatomical correlate; there 
are multiple opportunities in the retinal circuitry for interac- 
tions among three or more ganglion cells. We explored circuits 
composed of three RGC cells with input statistics, recurrent 
connectivity and spike-generating mechanisms based directly 
on experiment. We based our model on ON parasol RGCs, 
one of the RGC types for which PME approaches have been 
applied extensively (Shlens et al., 2006, 2009). In addition, by 
examining how marginal input statistics are shaped by stimu- 
lus filtering, we also reveal the role that the specific filtering 
properties of ON parasol cells have in shaping higher-order 
interactions. 

2.2.1. RGC model 

We modeled a single ON parasol RGC in two stages (for details 
see section 4). First, we characterized the light-dependent excita- 
tory and inhibitory synaptic inputs to cell k (gl^'^(t), g]f^it)) in 



response to randomly fluctuating light inputs Sk(t) via a linear- 
nonlinear model, e.g.,: 



(4) 



where N'^^'^ is a static non-linearity, L'^'"^ is a linear filter, and ti™ is 
an effective input noise that captures variability in the response to 
repetitions of the same time-varying stimulus. These parameters 
were determined from fits to experimental data collected under 
conditions similar to those in which PME models have been tested 
empirically (Shlens et al, 2006, 2009; Trong and Rieke, 2008). The 
modeled excitatory and inhibitory conductances captured many 
of the statistical features of the real conductances, particularly the 
correlation time and skewness (data not shown). 

Second, we used Equation (4) and an equivalent expression 
forg™'^(f) as inputs to an integrate-and-fire model incorporating 
a non-linear voltage and history-dependent term to account for 
refractory interactions between spikes (Badel et al., 2007, 2008). 
The voltage evolution equation was of the form 



dV 
dt 



f (V, f-flast) + 



^ input 



(f) 



c 



(5) 



where -F (V, t — tiast) was allowed to depend on the time of the last 
spike tiast. Briefly, we obtained data from a dynamic clamp exper- 
iment (Sharpe et al., 1993; Murphy and Rieke, 2006) in which 
currents corresponding to g™(f) and ^'"'^(f) were injected into 
a cell and the resulting voltage response measured. The input 
current /input injected during one time step was determined by 
scaling the excitatory and inhibitory conductances by driving 
forces based on the measured voltage in the previous time step; 
that is, 



input 



(0 



'(f) (V - Ve) 



(0 (V - Vi) , (6) 



We used this data to determine -F and C using the procedure 
described in Badel et al. (2007); details, including values of all fit- 
ted parameters, are described in section 4. Recurrent connections 
were implemented by adding an input current proportional to the 
voltage difference between the two coupled cells. 

The prescription above provided a flexible model that allowed 
us to study the responses of three-cell RGC networks to a wide 
range of light inputs and circuit connectivities. Specifically, we 
simulated RGC responses to light stimuli that were (1) con- 
stant, (2) time-varying and spatially uniform, and (3) varying 
in both space and time. Correlations between cell inputs arose 
from shared stimuli, from shared noise originating in the retinal 
circuitry (Trong and Rieke, 2008), or from recurrent connec- 
tions (Dacey and Brace, 1992; Trong and Rieke, 2008). Shared 
stimuli were described by correlations among the light inputs s^. 
Shared noise arose via correlations in r\'j^'^ and r\^^^ as described in 
section 4. The recurrent connections were chosen to be consistent 
with observed gap-junctional coupling between ON parasol cells. 
We also investigated how stimulus filtering by L'^"'^ and V^^ influ- 
enced network statistics. To compare our results with empirical 
studies, constant light, and spatially and temporally fluctuating 
checkerboard stimuli were used as in Shlens et al. (2006, 2009). 



Frontiers in Computational Neuroscience 



www.frontiersin.org 



February 2014 | Volume 8 | Article 10 | 4 



Barreiro et al. 



Beyond-pairwise correlations in microcircuits 



2.2.2. The feedforward RGC circuit is well-described by the PME 
model for full-field light stimuli 

We start by considering networks without recurrent connectivity 
and with constant, full-field (i.e., spatially uniform) light stimuli. 
Thus, we set sj-(f) = 0 for A: = 1, 2, 3, so that the cells received 
only Gaussian correlated noise ri™ and ti^*^ and constant excita- 
tory and inhibitory conductances. Time-dependent conductances 
were generated and used as inputs to a simulation of three model 
RGCs. Simulation length was sufficient to ensure significance 
of all reported deviations from PME fits (see section 4). We 
found that the spiking distributions were strikingly well-modeled 
by a PME fit, as shown in the righthand panel of Figure 2A; 
Dkl {P, P) is 2.90 X 10^^ bits. This result is consistent with the 
very good fits found experimentally in Shlens et al. (2006) under 
constant light stimulation. 

Next, we introduce temporal modulation into the full-field 
light stimuli such that each cell received the same stimulus, 
Sjt(t) = s(f), where s(f) refreshed every few milliseconds with 
an independently chosen value from one of several marginal 
distributions. For our initial set of experiments, the marginal 
distribution was either Gaussian (as in Ganmor et al, 2011) or 
binary (as used in Shlens et al, 2006). For both choices, we 
explored inputs with a range of standard deviations (1/16, 1/12, 
1 /8, 1 /6, 1 /4, 1 /3, or 1/2 of a baseline light intensity) and refresh 
rates (8, 40, or 100 ms). The shared stimulus produced strong 
pairwise correlation between conductances of neighboring cells. 
However, values of Dkl(-P, P) remained small, under 10^^ bits in 
all conditions tested. 

2.2.3. Impact of stimulus spatial scale 

We next asked whether PME models capture RGC responses to 
stimuli with varying spatial scales. We fixed stimulus dynamics 
to match the two cases that yielded the highest D-k_i{P, P) under 
the full-field protocol: for both Gaussian and binary stimuli, we 
used 8 ms refresh rate and a = 1/2. The stimulus was generated 
as a random checkerboard with squares of variable size; each 
square in the checkerboard, or stixel, was drawn independently 
from the appropriate marginal distribution and updated at the 
corresponding refresh rate. The conductance input to each RGC 
was then given by convolving the light stimulus with its receptive 
field, where the stimulus was positioned with a fixed rotation and 
translation relative to the receptive fields. This position was drawn 
randomly at the beginning of each simulation and held constant 
throughout (see insets of Figures 3B,C for examples, and section 
4 for further details). 

The RGC spike patterns remained very well described by PME 
models for the full range of spatial scales. Figure 3A shows this 
by plotting Dy,l(P, P) vs. stixel size. Values of D^i{P, P) increased 
with spatial scale, sharply rising beyond 128 |i,m, where a stixel 
had approximately the same size as a receptive field center, illus- 
trating that introducing spatial scale via stbcels produces even 
closer fits by PME models (the points at 512 [xm correspond to 
the full-field simulations). 

Values reported in Figure 3A are averages of Dkl (J*, P) pro- 
duced by five random stimulus positions. At stixel sizes of 128 [im 
and 256 (xm, the resulting spiking distributions differed signif- 
icantly from position to position; in Figure 3B, we show the 



probabilities of the distinct singlet [e.g., P(1,0, 0)] and dou- 
blet [e.g., P(l, 1,0)] spiking events produced at 256 |im. Each 
stimulus position created a "cloud" of dots (identified by color); 
large dots show the average over 20 sub-simulations. Each sub- 
simulation was identified by a small dot of the same color; because 
the simulations were very well-resolved, most of them were con- 
tained within the large dots (and hence not visible in the figure). 
Heterogeneity across stimulus positioning is indicated by the dis- 
tinct positioning of differently colored dots. At smaller spatial 
scales, the process of averaging stimuU over the receptive fields 
resulted in spiking distributions that were largely unchanged with 
stimulus position, as shown in Figure 3C, where singlet and dou- 
blet spiking probabilities are plotted for 60 \im stixels. Thus, 
filtered light inputs were largely homogeneous from cell to cell, 
as each receptive field sampled a similar number of indepen- 
dent, statistically identical inputs; the inset of Figure 3C shows 
the projection of input stbcels onto cell receptive fields from an 
example with 60 |xm stbcels. The resulting excitatory conduc- 
tances and spiking patterns were very close to cell-symmetric (see 
Figures S2B,C). 

By contrast, spiking patterns showed significant heterogene- 
ity from cell to cell when the stixel size was large, as illustrated 
in Figure 3B. This arises because each cell in the population may 
be located differently with respect to stbcel boundaries, and there- 
fore receive a distinct pattern of input activity; this is illustrated by 
the inset of Figure 3B, which shows the projection of input stbc- 
els onto cell receptive fields from one such simulation. However, 
PME models gave excellent fits to data regardless of heterogeneity 
in RGC responses (see Figures S2E,F); as seen in Figure 3A, over 
all 20 sub-simulations, and over all individual stixel positions, we 
found a maximal Dkl(-P, P) value of 0.0081 1. 

2.2.4. Conductance profiles and impact of stimulus filtering 

Intrigued by the consistent finding of low values of Dy,l(P, P) 
from the RGC model circuit despite stimulation by a wide vari- 
ety of highly correlated stimulus classes, we sought to further 
characterize the processing of light stimuli by this circuit. In par- 
ticular, we examined the effects of different marginal statistics of 
light stimuli, standard deviation of full-field flicker, and refresh 
rate on the marginal distributions of excitatory conductances. We 
focused on excitatory conductances because they exhibit stronger 
correlations than inhibitory conductances in ON parasol RGCs 
(Trong and Rieke, 2008). 

With constant light stimulation (no temporal modulation) the 
excitatory conductances were unimodal and broadly Gaussian 
(Figure 2A, middle panel). For a short refresh rate (8 ms) or 
small flicker size (standard deviation 1/6 or 1 /4 of baseline light 
intensity), temporal averaging via the filter and the approxi- 
mately linear form of over these light intensities produced 
a unimodal, modestly skewed distribution of excitatory con- 
ductances, regardless of whether the flicker was drawn from a 
Gaussian or binary distribution (see Figures 2B,C, center pan- 
els). For a slower refresh rate (100 ms) and large flicker size (s.d. 
1/3 or 1/2 of baseline light intensity), excitatory conductances 
had multi-modal and skewed features, again regardless of whether 
the flicker was drawn from a Gaussian or binary distribution 
(Figure 2D). Other parameters being equal, binary light input 
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FIGURE 2 I Results for RGC simulations with constant light and full-field 
flicker. (A-C) (Left) A liistogram and time series of stimulus, (center) a 
histogram of excitatory conductances and (right) the resulting distribution of 
spiking patterns. Stimuli are shown as deviations from a baseline intensity, 
expressed as a fraction of the baseline. Right panels show the probability 
distribution on spiking patterns P obtained from simulation ("Observed"; dark 
blue), and the corresponding pairwise approximation P ("PME"; light pink). 
Each row gives these results for a different stimulus condition. (A) No stimulus 
(Gaussian noise only). (B) Gaussian input, standard deviation 1 /6, refresh rate 



produced more skewed conductances. While some conductance 
distributions had multiple local maxima, these were never well 
separated, with the envelope of the distribution still resembling a 
skewed distribution. 



8 ms. (C) Binary input, standard deviation 1 /3, refresh rate 8 ms. (D) Binary 
input, standard deviation 1 /3, refresh rate 100 ms. For panel (D), the data in the 
left panel differs. (Left, top panel) The excitatory filter L'""^(t) (Equation 7) is 
shown instead of a stimulus histogram; (Left, bottom panel) the normalized 
excitatory conductance, as a function of time (red dashed line), is 
superimposed on the stimulus (blue solid). (Center) The histogram of excitatory 
conductances and (right) the resulting distribution of spiking patterns. Both the 
form of the filter and the conductance trace illustrate that the LN model that 
processes light input acts as a (time-shifted) high pass filter. 



The mechanism that leads to unimodal distributions of con- 
ductances, even when light stimuli are binary, is high-pass 
filtering — a consequence of the differentiating linear filter in 
Equation (7) and illustrated in Figure 2D. To demonstrate this. 
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we constructed an alternative filter with a more monophasic 
shape [Equation (9), illustrated in Figure SI] and compared the 
excitatory conductance distributions side-by-side. We saw a strik- 
ing difference in the response to long time scale, binary stimuli: 
the distributions produced by the monophasic filter reflected the 
bimodal shape of the input. Interestingly, the resulting simulation 
produced eight- times greater Dkl(P, P) (Figure 4). This suggests 
that greater D-k_i{P, P) may occur when ganglion cell inputs are 
primarily characterized via monophasic filters, e.g., at low mean 
light levels for which the retinal circuit acts to primarily integrate, 
rather than differentiate over time. 

In Figure 4A, we examine this effect over all full- field stimu- 
lus conditions by plotting D^i(P, P) from simulations with the 
monophasic filter, against Dy,l{P, P) from simulations in which 
the original filter was used with the same stimulus type. An 
increase in Dxl(P, P) was observed across stimulus conditions, 
with a markedly larger effect for longer refresh rates. This con- 
sistent change could not be attributed to changes in lower order 



statistics; there was no consistent relationship between the change 
in pairwise model performance and either firing rate or pairwise 
correlations (data not shown). Instead, large effects in Drl were 
accompanied by a striking increase in the bi- or multi-modality of 
excitatory conductances (see Figure 4B). In Figure 4C, we show 
an example stimulus and excitatory current trace taken from the 
simulation shown in Figure 4B: the monophasic filter allows the 
excitatory synaptic currents to track a long-timescale, bimodal 
stimulus with higher fidelity, transferring the bimodality of the 
stimulus into the synaptic currents. This finding was robust to 
specifics of the filtering process; we were able to reproduce the 
same results by designing integrating filters in different ways (data 
not shown). 

2.2.5. Recurrent connectivity in the RGC circuit 

We next considered the role of recurrence in shaping higher- 
order interactions by incorporating gap junction coupling into 
our simulations. We did this separately for each full-field stimulus 




FIGURE 3 I Results for RGC simulations with light stimuli of varying 
spatial scale ("stixels"). (A) Average Dkl(P, P) as a function of stixel 
size. Values were averaged over five stimulus positions, each with a 
different (random) stimulus rotation and translation; 512 ixm corresponds 
to full-field stimuli. For the rest of the panels, data from the binary 
light distributions is shown; results from the Gaussian case are similar. 
(B,C) Probability of singlet and doublet spiking events, under stimulation 
by movies of 256 M-m (B) and 60 (im (C) stixels. Event probabilities are 
plotted in 3-space, with the x, y, and z axes identifying the singlet 



(doublet) events 001 (Oil), 010 (101), and 100 (110), respectively. The 
black dashed line indicates perfect cell-to-cell homogeneity 
(e.g., P[(l, 0, 0)] = P[(0, 1, 0)J = P[(0, 0, 1)]). Both individual runs (dots) 
and averages over 20 runs (large circles) are shown, with averages 
outlined in black (singlet) and gray (doublet). Different colors indicate 
different stimulus positions. Insets: contour lines of the three receptive 
fields (at the 1 and 2 SD contour lines for the receptive field center; 
and at the zero contour line) superimposed on the stimulus checkerboard 
(for illustration, pictured in an alternating black/white pattern). 
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FIGURE 4 I Comparison of RGC simulations computed with the 
original ON parasol filter, vs. simulations using a more monophasic 
filter. (A) Dkl(P, P) for original vs. monophasic filter. Data is organized 
by stimulus refresh rate (8, 40, and 100 ms) and marginal statistics 
(Gaussian vs. binary). (B) Histograms of excitatory conductances for an 
illustrative stimulus class, under original (top) and monophasic (bottom) 



filters. The marginal statistics and refresh rate are illustrated by icons 
inside black circles; here, binary stimuli with refresh rate 100 ms. The 
input standard deviation (expressed as a fraction of baseline light 
intensity) was 1/2. (C) Time course of stimulus and resulting excitatory 
conductances, from simulation shown in (B): original (top) 
vs. monophasic (bottom) filters. 
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condition described earlier. In each case, we added gap junction 
coupling with strengths from 1 to 16 times an experimentally 
measured value (Trong and Rieke, 2008), and compared the 
resulting Dkl with that obtained without recurrent coupling 
(Figure 5). 

At the experimentally measured coupling strength {g^'^P = 
1.1 nS) itself, the fit of the pairwise model barely changed 
(Figure 5A) from the model without coupling. At twice the mea- 
sured coupling strength (g^^P = 2.2 nS), recurrent coupling had 
increased higher-order interactions, as measured by larger values 
of Dkl for all tested stimulus conditions. Higher order inter- 
actions could be further increased, particularly for long refresh 
rates (100ms), by increasing the coupling strength to four or 
eight times its baseline level (g^^P = 4.4 nS or g^'^'^ = 8.8 nS; see 
Figures 5B,C). Consistent with the intuition that very strong cou- 
pling leads to "all-or-none" spiking patterns, D^i^(P, P) decreased 
as gS^P increased further, often to a level below what was seen in 
the absence of coupling (Figure 5D). In summary, the impact of 
coupling on Dkl is maximized at intermediate values of the cou- 
pling strength. However, the impact of recurrent coupling on the 
maximal values of Dkl evoked by visual stimuli is small over- 
all, and almost negligible for experimentally measured coupling 
strengths. 

2.2.B. Modeling heavy-tailed light stimuli in the RGC circuit 

Finally, we repeated the full-field, recurrent, and alternate filter 
simulations previously described with light stimuli drawn from 
either Cauchy or heavy-tailed distributions: such distributions 



have been found to model the frequency of occurrence of lumi- 
nance values in photographs of natural scenes (Ruderman and 
Bialek, 1994). In contrast to previous results with Gaussian and 
bimodal inputs, here we found very low Dkl (J*, P) over all stimu- 
lus conditions: the largest values found were more than an order 
of magnitude smaller than those obtained earlier. Specifically, for 
all conditions, we found Dkl{P, P) < 4.5 x 10^"*, over all 42 net- 
work realizations; for many simulations, this number did not 
meet a threshold for statistical significance (see section 4.1.7), 
indicating that P and P were not statistically distinguishable. 
Using a more monophasic filter resulted in no apparent con- 
sistent change to D-^i{P, P). When gap junction coupling was 
added, Dkl (J", P) was maximized at an intermediate value; when 
ggap _ g simulations produced a statistically significant 

Dxl(P, P) ~ 3 — 4 X 10^^. However, overall levels remained rel- 
atively low, roughly 1/2 the value achieved with Gaussian or 
binary stimuli. 

To explain these findings, we examined the excitatory input 
currents: we found that over a broad range of refresh rates and 
stimulus variances, the marginal distributions of excitatory input 
conductances produced were remarkably unimodal in shape, 
and showed little skewness (Figure 6A). By examining the time 
evolution of the filtered stimuli (see Figure 6B), we see that heavy- 
tailed distributions allow rare, large events, but at the expense of 
medium-size events which explore the fuU range of the linear- 
nonlinear model used for stimulus processing (compare the blue 
with the red/green traces). When combined with the Gaussian 
background noise, this produces near-Gaussian excitatory con- 
ductances and, as may be expected from our original fall-field 
simulations, very low Dkl- 

We hypothesize that the methodology of averaging over the 
entire stimulus ensemble may not capture the significance of rare 
events that may individually be detected with high fidelity: Dkl 
was low even for full-field, high variance stimuli, which presum- 
ably caused (infrequent) global spiking events. Additionally, an 
important avenue for future work would be to test the ability 
of our RGC model, which was trained on Gaussian stimuli, to 
accurately model the response of a ganglion cell to stimuli whose 
variance is dominated by large events. Recent work examining 
the adaptation of retinal filtering properties to higher-order input 
statistics found little evidence of adaptation; however, the stimuli 
used in this work incorporated significant kurtosis but not heavy 
tails (Tkaciketal, 2012). 

2.2. 7. Summary of findings for RG C circuit 

In summary, we probed the spiking response of a small array 
of RGC models to changes in light stimuli, gap junction cou- 
pling, and stimulus filtering properties, and identified two cir- 
cumstances in which higher-order interactions were robustly 
generated in the spiking response. First, higher-order interac- 
tions were generated when excitatory currents had bimodal 
structure; we observed such structure when bimodal light stim- 
uli was processed by a relatively monophasic filter. Secondly, 
higher-order interactions were maximized at an intermediate 
value of gap junction coupling; this value was, however, much 
larger (eight times) than the experimentally observed coupling 
strength. 
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FIGURE 5 I The impact of recurrent coupling on RGC networl<s with 
full-field visual stimuli. Tlie strengtli of gap junction connections was 
varied from a baseline level (relative magnitude g = 1 , or absolute 
magnitude g^^'P = 1.1 nS) to an order of magnitude larger {g= 16, or 
ggap _ -fj g i^g) |p ggj,f^ panel, Dki(P, P) obtained with coupling is plotted 
vs. the value obtained for the same stimulus ensemble without coupling, 
for each of 42 different stimulus ensembles. (A) gS^'P = 1.1 nS 
(experimentally observed value); (B) g9=P = 4.4 nS; (C) gS'^'P = 8.8 nS; (D) 
ggap = 17.6 nS. 
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FIGURE 6 I Results for RGC simulations witii iieavy-tailed inputs. 

(A) Histograms of excitatory conductances, for tine original (left) 
vs. monophasic (right) filter. The marginal statistics are heavy-tailed skew 
(top) and Cauchy (bottom) inputs, and refresh rate is 40 ms for both panels. 
The input standard deviation (expressed as a fraction of baseline light 
intensity) was 1 /2 for both simulations. (B) Sample 100 ms stimuli, filtered 
by the original linear filter Lexc (top) and altered, monophasic filter 
Z.exc,M(t>ottoiT^)- Cauchy (blue solid), Gaussian (red dashed), and bimodal 
(green dash-dotted) stimuli are shown. 



2.3. A SIMPLIFIED CIRCUIT THAT EXPLAINS TRENDS IN RGC CELL 

MODEL 
2.3. 1. Setup and motivation 

In the previous section, we developed results for a computational 
model tuned to a very specific cell type; we now ask whether these 
findings wiU hold ior a more general class of neural circuits, or 
whether they are the consequence of system-specific features. To 
answer this question, we considered a simplified model of neu- 
ral spiking: a feedforward circuit in which three spiking cells 
sum their inputs and spike according to whether or not they 
cross a threshold. Such highly idealized models of spiking have 
a long history in neuroscience (McCulloch and Pitts, 1943) and 
have been recently shown to predict the pairwise and higher- 
order activity of neural groups in both neural recordings and 



more complex dynamical spiking models (Nowotny and Huerta, 
2003; Tchumatchenko et al, 2010; Yu et al, 2011; Leen and 
Shea-Brown, 2013). 

In more detail, each cell received an independent input 
Ij and a "triplet" — (global) input Ic that is shared among all 
three cells. Comparison of the total input Sj = Ic + Ij with a 
threshold 8 determined whether or not the cell spiked in that 
random draw. An additional parameter, c, identified the frac- 
tion of the total input variance originating from the global 
input; that is, c = Var[7c]/Var[7c + Ij]- The global input was cho- 
sen from one of several marginal distributions, which included 
those used in the RGC model: Gaussian, bimodal, and heavy- 
tailed. The independent inputs Ij were, in all cases, chosen from 
a Gaussian distribution, consistent with our RGC model. When 
the common inputs are Gaussian, our model is equivalent to 
the Dichotomized Gaussian model previously studied by several 
groups (Amari et al, 2003; Macke et al, 2009, 2011; Yu et al, 
2011), cf. (Tchumatchenko et al., 2010). For further details, see 
section 4.2. 

In the RGC model large effects in Dkl were accompanied by 
a striking increase in the bi- or multi-modality of excitatory con- 
ductances. Why are bimodal inputs, shared across cells, able to 
produce spiking responses that deviate from the pairwise model? 
We use our simple thresholding model to provide some intu- 
ition for how bimodal common inputs to thresholding cells lead 
to spiking probabilities that violate the constraints (Equation 3) 
which must hold for the pairwise model. For example, suppose 
that the common input Ic can take on values that cluster around 
two separated values, |Xa < (Xb, but rarely in the interval between; 
that is, the distribution of Ic is bimodal. If [xb is large enough 
to push the cells over threshold but [Ia is not, then we see that 
any contribution to the right-hand side of Equation (3), pi/pi, 
depends only on the distribution of the independent inputs Ij; 
if either one or two cells spike, then the common input must 
have been drawn from the cluster of values around (jl^v, because 
otherwise all three cells would have spiked. 

To be concrete, let P[x] refer to the probability of spiking event 
X = {xi, X2, x^), and P[x | Ic ~ [Ia] refer to the probability that x 
occurs, conditioned on the event ^ M-A- Then 

P [(1, 0, 0)] = P [(1, 0, 0) I 7, ^ \ia\ P [Ic ~ |Xa] 

-FP [(1,0,0) I /c ~ IXb] P [7. ~ (Xb] 

= P [(1,0,0) \Ic^ilA]PlIc~[lA] 

because P [(1, 0, 0) | 7^ ~ |Xb] = 0: for the same reason, 
[(1, 1, 0)] = P [(1, 1, 0)\lc~ \ia] P Uc ~ [Xa] 

therefore 

p2 ^ P[(1.1,0) |7, ^|Xa]P[Jc~|Xa] 

Pl P[{l,0,0)\Ic^lLA]P[Ic~ilA] 

^ P[(l,1.0) |7,^|Xa] 
P[(1,0,0) |7, «jxa] 
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On the other hand, 

P3 ^ P Uc ^[ib]+P Kh 1. 1) I ~ [Ia] P [Ic ~ [Ia] 

P0~ P [{0,0,0)\ I, ^iLA]PUc^V^A] 

By changing the relative likeHhood of drawing the common 
input from one cluster or the other, without changing the 
values of [la and \ib themselves (that is, change P[Ic ~ [xb] 
and P[Ic^[Ia] but leave the conditional probabilities (e.g., 
P[(l, 0, 0) I 7c ~ M-a]) fixed) one may change the ratio pa/po 
without changing the ratio pi/ pi- Hence the constraint specify- 
ing those network responses exactly describable by PME models 
can be violated when the common input is bimodal. 

In contrast, we may instead consider a unimodal common 
input, of which a Gaussian is a natural example. Here, the dis- 
tribution of the common input Ic is completely described by its 
mean and variance; both parameters can impact the ratio p^ /po 
(by altering the likelihood that the common input alone can trig- 
ger spikes) and the ratio pi/pi- Each value of is consistent with 
both events pi and p2, with the relative likelihood of each event 
depending on the specific value of Ic\ it is no longer clear how to 
separate the two events. In the following sections, we will confirm 
this intuition by direct evaluation of the resulting departure from 
pairwise statistics. 

2.3.2. Model input distributions 

Motivated by our observations of excitatory currents that arose 
in the RGC model, we chose several input distributions that 
allow us to explore other salient features, such as symmetry 
and the probability of large events. A distribution is called sub- 
Gaussian if the probability of large events decays rapidly with 
event size, so that it can be bounded above by a scaled Gaussian 
distribution (see section 4). We considered two sub-Gaussian dis- 
tributions; the Gaussian itself, and a skewed distribution with 
a sub-Gaussian tail (hereafter referred to as "skewed"). We also 
considered the two "heavy-tailed" distributions used as stimuli to 
the RGC model — the Cauchy distribution, and a skewed distribu- 
tion with a Cauchy-like tail (hereafter referred to as "heavy-tailed 
skewed"). In these distributions, the probability of large events 
decays polynomially rather than exponentially. 

For each choice of common input marginal, we varied the 
input parameters so as to explore a fall range of firing rates and 
pairwise correlations: specifically, we varied the input correlation 
coefficient c in the range [0, 1], the total input standard deviation 
cr in the range [0, 4], and the threshold © in [0, 3]. In all cases 
the independent inputs Ij were chosen from a Gaussian distribu- 
tion [of variance (1 — c)a^ ] . For each choice of input parameters, 
we determine the resulting distribution on spiking states (as 
described in section 4) and compute the PME approximation. 

2.3.3. Unimodal common inputs fail to produce significant 
higher-order interactions in three-cell feedforward circuits 

We first considered common inputs chosen from a unimodal 
(e.g., Gaussian) distribution. If is Gaussian, then the joint dis- 
tribution of S = (Si , S2, S3) is multivariate normal, and therefore 
characterized entirely by its means and covariances. Because the 
PME fit to a continuous distribution is precisely the multivari- 
ate normal that is consistent with the first and second moments. 



every such input distribution on S exactly coincides with its 
PME fit. However, even with Gaussian inputs, outputs (which 
are now in the binary state space {0, 1}^) wiU deviate from the 
PME fit (Amari et al, 2003; Macke et al, 2009). As shown below, 
non-Gaussian unimodal inputs can produce outputs with larger 
deviations. Nonetheless, these deviations are small for all cases 
in which inputs were chosen from a sub-Gaussian distribution, 
and PME models are quite accurate descriptions of circuits with a 
broad range of unimodal inputs. 

We first considered circuits with either Gaussian or skewed 
common inputs. Over the full range of input parameters, distri- 
butions remained well fit by the pairwise model, with a maximum 
value of Dkl(P, P) (of 0.0038 and 0.0035 for Gaussian and 
skewed inputs, respectively) achieved for high correlation val- 
ues and cr comparable to threshold. In Figure 7 A we illustrate 
these trends with a contour plot of Dkl(P, P) for a fixed value 
of threshold (here, 0 = 1.5) and Gaussian common inputs (the 
analogous plot for skewed inputs is qualitatively very similar. 
Figure S3 A). 

Clear patterns also emerged when we viewed Dkl(P, P) as 
a function of output spiking statistics rather than input statis- 
tics (as in Macke et al., 2011). Non-linear spike generation can 
produce substantial differences between input and output cor- 
relations; this relationship can vary widely based on the specific 
non-linearity (Moreno et al., 2002; de la Rocha et al, 2007; 
Marella and Ermentrout, 2008; Shea-Brown et al, 2008; Vilela 
and Lindner, 2009; Barreiro et al, 2010, 2012; Tchumatchenko 
et al, 2010; Hong et al, 2012). Figure 7B shows Dkl(P, P) and A 
for all threshold values (including the data shown in Figure 7A), 
but now plotted with respect to the output firing rate. The data 
were segregated according to the Pearson's correlation coeffi- 
cient p between the responses of cell pairs (p = Cov(x,,xj )^^ _ 

^Var(x,)Var(xj) 

2 

ix'(7-ii) ^'^^ ^ fixed correlation, there was generally a one-to-one 
relationship between firing rate and Dkl(P, P)- For these distri- 
butions (Figure 7B, for Gaussian inputs; skewed inputs shown in 
Figure S3B), Dkl(P, P) was maximized at an intermediate firing 
rate. Additionally, D^i(P, P) had a non-monotonic relationship 
with spike correlation: it increased from zero for low values of 
correlation, obtained a maximum for an intermediate value, and 
then decreased. These limiting behaviors agree with intuition: a 
spike pattern that is completely uncorrected can be described by 
an independent distribution (a special case of PME model), and 
one that is perfectly correlated can be completely described via 
(perfect) pairwise interactions alone. 

We next considered circuits in which inputs were drawn from 
one of two heavy-tailed distributions, the Cauchy distribution 
and a heavy-tailed skewed distribution, defined earlier. Here, dis- 
tinctly different patterns emerge: for a fixed 0, Dkl(P, P) is 
maximized in regions of high input correlation and high input 
variance 0, but relatively high values of Dkl are achievable 
across a wide range of input values (see Figure 7C for Cauchy 
inputs; heavy- tailed skewed in Figure S3C). However, the max- 
imum achievable values of Dkl were achieved at intermediate 
output correlations p ~ 0.4 (see Figure 7D for Cauchy inputs; 
heavy-tailed skewed shown in Figure S3D); this suggests that high 
input correlations do not result in high output correlations. 
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FIGURE 7 I Strength of higher-order interactions produced by the 
threshold model as input parameters vary, and the relationship of 
these higher-order interactions with other output firing statistics. 

(A) For Gaussian common inputs: Dki(P, P) as a function of input 
correlation c and input standard deviation a, for a fixed threshold 0= 1.5. 
Color indicates Dkl(P. P)', see color bar for range. (B) For Gaussian 
common inputs: Dkl(P, P) vs. firing rate (Left) and the fraction of 
multi-information (A) captured by the PME model vs. firing rate (Right). 



Each dot represents the value obtained from a single choice of the input 
parameters c, a, and 0; input parameters were varied over a broad range 
as described in section 2. Firing rate is defined as the probability of a 
spike occurring per cell per random draw of the sum-and-threshold model, 
as defined in Equation (16). Color indicates output correlation coefficient p 
ranging from black for p e (0, 0.1), to white for p e (0.9, 1), as illustrated in 
the color bars. (CD): as in (A,B), but for Cauchy common inputs. (E,F): as 
in (A,B), but for bimodal common inputs. 



This somewhat unintuitive finding may be explained by the 
structure of the PDF of a heavy-tailed common input, which 
favors (infrequent) large events at the expense of medium- 
size events. For instance, the probability that a Cauchy input 
is above a given threshold (P[Ic > © > £[7^]]) is often much 
smaller than for a Gaussian distribution of the same vari- 
ance. However, an input can trigger at best one single spik- 
ing event regardless of size: therefore a Cauchy common input 
generates fewer correlated spiking events with larger inputs, 
while a Gaussian common input triggers correlated spiking 
events with smaller, but more frequent, input values. As a 
result, heavy-tailed inputs are unable to explore the full range 
of output firing statistics: Figure 7D shows that high out- 
put correlations only occur at very low firing rates. Overall, 
Dkl{P, P) reaches higher numerical values than for sub-Gaussian 
inputs, possibly reflecting the higher-order statistics in the input. 
However, the maximal D-k_i(P, P) attained still falls far short 
of exploring the full range of possible values (compare with 
Figure IB). 



Finally, we examine the behavior of the strain, which 
quantifies both the magnitude and sign of deviation from 
the pairwise model (see Ohiorhenuan and Victor, 2010). It 
has been previously observed that the strain is negative for 
the DG model (Macke et al., 2011), a condition that has 
been related to sparsity of the neural code and with which 
our results agree (data not shown). However, we found that 
any other choice of input marginal statistics, both posi- 
tive and negative values are seen; for heavy-tailed common 
inputs, positive values predominated except at very low firing 
rates. 

2.3.4. Bimodal triplet inputs can generate higher-order interactions 
in three-cell feedforward circuits 

Having shown that a wide range of unimodal common inputs 
produced spike patterns that are well-approximated by PME fits, 
we next examined bimodal common inputs. Such inputs sub- 
stantially increased departures from PME fits in the ganglion cell 
models described above. As in the previous section, we varied c. 
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a, and © so as to explore a full range of firing rates and pairwise 
correlations. 

As a function of input parameter values, DklCJ", P) is maxi- 
mized for large input correlation and moderate input variance 
[see Figure 7E, which illustrates D^i^(P, P) for a fixed threshold 

0 = 1.5]. Figure 7F shows D^i(P, P) values as a function of the 
firing rate and pairwise correlation elicited by the fiill range of 
possible bimodal inputs. We see that D^(P, P) is maximized at 
an intermediate (but relatively high: v 0.4) firing rate, and for 
intermediate-to-large correlation values (p ~ 0.6 — 0.8). 

We find distinctly different results when we view A 
(Equation 1), for these same simulations, as a function of output 
spiking statistics (right panels of Figures 7B,D,F). For unimodal, 
sub-Gaussian distributions (Figure 7B), A is very close to 1, with 
the few exceptions at extreme firing rates. For heavy-tailed and 
bimodal inputs (Figures 7D,F), A may be appreciably far from 

1 (as small as 0.5) with the smallest numbers (suggesting a poor 
fit of the pairwise model) occurring for low correlation p. This 
highlights one interesting example where these two metrics for 
judging the quality of the pairwise model, D^i^(P, P) and A, yield 
contrasting results. 

Finally, we emphasize that while bimodal inputs can produce 
greater higher-order interactions than unimodal inputs, the val- 
ues of DklC^", P) accessible by feedforward circuits with global 
inputs remain far below their upper bounds at any given fir- 
ing rate. The maximal values of DklCJ*, P) reached by Cauchy 
and heavy-tailed skewed inputs were 0.0078 and 0.0153; bimodal 
common inputs reached a maximal value of 0.091. This is an 
order of magnitude smaller than possible departures among sym- 
metric spike patterns (compare Figure IB). The difference is 
illustrated in Figure S4, which compares the Dki(P,P) values 
obtained in the thresholding model and those obtained by direct 
exhaustive search at each firing rate by superposing the datapoints 
on a single axis. 

2.3.5. Mathematical analysis of unimodal vs. bimodal effects 

The central finding above is that circuits with bimodal inputs can 

generate significantly greater higher-order interactions than cir- 
cuits with unimodal inputs. To probe this further, we investigated 
the behavior of DklC^*, P) for the feedforward threshold model 
with a perturbation expansion in the limit of small common 
input. We found that as the strength of common input signals 
increased, circuits with bimodal inputs diverged from the PME 
fit more rapidly than circuits with unimodal inputs; the full cal- 
culation is given in the Appendix. In brief, we determined the 
leading order behavior of Dkl(P, P) in the strength c of (weak) 
common input. Dkl(P, P) depended on for unimodal distri- 
butions, i.e., the low order terms in c dropped out; for symmetric 
unimodal distributions, such as a Gaussian, DklC?, P) grew as c*. 
For bimodal distributions, D^i(P, P) grew as c^. Because of the 
dependence, rather than or c*, as the strength of common input 
signals c increases, circuits with bimodal inputs are predicted to 
produce greater deviations from their PME fits. 

2.3.6. Impact of recurrent coupling 

We next modified our thresholding model to incorporate the 
effects of recurrent coupling among the spiking cells. To mimic 



gap junction coupling in the RGC circuit, we considered aU-to- 

all, excitatory coupling, and assumed that this coupling occurs on 
a faster timescale compared with the timescale over which inputs 
arrive at the cells. 

Our previous model was extended as follows: if the inputs 
arriving at each cell elicited any spikes, there was a second 
stage at which the input to each neuron receiving a connection 
from a spiking cell was increased by an amount g. This repre- 
sented a rapid depolarizing current, assumed for simplicity to add 
linearly to the input currents. If the second stage resulted in addi- 
tional spikes, the process was repeated: recipient cells received an 
additional current g, and their summed inputs were again thresh- 
olded. The sequence terminated when no new spikes occurred on 
a given stage; e.g., for N = 3, there were a maximum of three 
stages. The spike pattern recorded on a given trial was the total 
number of spikes generated across all stages. 

We then explored the impact of varying g for a single repre- 
sentative value of cr and 0, and several values of the correlation 
coefficient c. We found that as g increased D^(P,P) varied 
smoothly, reflecting the underlying changes in the spike count 
distribution. For small c (c = 0.02 shown in Figure 8A), where 
the variance of common input is very small, the results var- 
ied little by input type: for all input types I>kl(P,P) reached 
an interior maximum near g ~ 1.7. As c increases, the distinc- 
tions between inputs types become apparent (Figures 8B,C show 
c = 0.2, 0.5, respectively): for most input types and values of c, 
the value of Dyj^(P, P) reaches an interior maximum that exceeds 
its value without coupling (i.e., g = 0). However, overall values 
of D}a,{P, P) remained modest, never exceeding 0.01 across the 
values explored here. 

2.3.7. Summary of findings for simplified circuit model 

We examined a highly idealized model of neural spiking, so 
as to explore the generality of our earlier findings in a small 
array of RGC models. We found that our main results from the 
RGC model — that higher-order interactions were most signif- 
icant when inputs had bimodal structure, and that when fast 
excitatory recurrence was added to the circuit, higher-order inter- 
actions were maximized at an intermediate value of the recur- 
rence strength — persisted in this simplified model. Moreover, we 
were able to show that the first of these findings is general, in that 
it holds over a complete exploration of parameter space. 

2.4. SCALING OF HIGHER-ORDER INTERACTIONS WITH POPULATION 
SIZE 

The results above suggest that unimodal, rather than bimodal, 
input statistics contribute to the success of PME models. Next, 
we examined whether this conclusion continues to hold when we 
increase network size. The permutation-symmetric architectures 
we have considered so far can be scaled up to more than three cells 
in several natural ways; for example, we can study N cells with a 
global common input. 

We considered a sequence of models in which a set of N 
threshold spiking units received global input [with mean 0 and 
variance a^c] and an independent input Ij [with mean 0 and vari- 
ance 0^(1 — c)]. As for the three-cell network models considered 
previously, the output of each cell was determined by summing 



Frontiers in Computational Neuroscience 



www.frontiersin.org 



February 2014 | Volume 8 | Article 10 | 12 



Barreiro et al. 



Beyond-pairwise correlations in microcircuits 



A 




0 0.5 1 1.5 2 




9 

FIGURE 8 I The impact of recurrent coupling on thie three-cell 
sum-and-threshold model. Eacli plot shows Dkl(P, P) as a function of g, 
for a specific value of the correlation coefficient. In all panels, input 
standard deviation cr = 1 , threshold 0 = 1 .5, N = 3 and symbols are as 
described in the legend for (C). Abbreviations in the legend denote the 
marginal distribution of the common input: G, Gaussian; SK, skewed; C, 
Cauchy; HT, heavy-tailed skewed; B, bimodal. (A) For input correlation 
c = 0.02, (B) c = 0.2, and (C) c = 0.5. 



and thresholding these inputs. Upon computing the probability 
distribution of network outputs (section 4), we fit a PME distri- 
bution. Again, we explored a range of cr, c, and 0 and recorded the 
maximum value of Dy.l(P, P) between the observed distribution 
P and its PME fit P. Figure 9 shows this Dyh/N [i.e., entropy per 
cell (Macke et al., 2009)] for each class of marginal distributions. 

We found that the maximum Dy,l(P, P)/N increased roughly 
linearly with N for Gaussian, skewed and Cauchy inputs; for 
heavy- tailed skew and bimodal inputs, D^i^(P, P)/N appeared to 
saturate after an initial increase (Figure 9). The relative order- 
ing for unimodal inputs shifted as N increased; as N ^ 16, the 
maximal achievable D^i^(P, P) for sub-Gaussian inputs overtook 
the values for heavy-tailed inputs. At all values of N, the val- 
ues for Gaussian and skewed inputs tracked one another closely. 
Regardless, the values for all unimodal inputs remained substan- 
tially below the maximal value achievable for bimodal inputs. 
Figure 9B shows that the probability distributions produced by 
these inputs qualitatively agree with this trend: departures from 
PME were more visually pronounced for global bimodal inputs 
than for global unimodal inputs. In addition, the distributions for 
heavy-tailed and sub-Gaussian inputs differed qualitatively, offer- 
ing a potential mechanism for different scaling behavior. Using 
the relationship between Dkl and likelihood ratios (described 



in section 2.1), at N = 16, the value D^i/N ~ 0.1 for bimodal 
global inputs corresponds to a likelihood ratio of 0.33 that a sin- 
gle draw from P (single network output) in fact came from the 
PME fit P rather than from P; a likelihood <0.01 is reached for 
four draws. 

We next extended our model with recurrent coupling to N > 3 
cells. In addition to the parameters for the uncoupled network, 
we varied the coupling strength, g, for each type of input. As in 
the N = 3 network, coupling was all-to-aU. As for the small net- 
works explored in an earlier section, Dkl(P, P) generally peaked 
at an intermediate value of the coupling strength g; however, 
the value of g decreased as population size N increased (illus- 
trated in Figure lOA, for c = 0.2). This may be attributed to 
the increased potential impact of recurrence at larger popula- 
tion sizes; as N increases, the number of potential additional 
spikes that may be triggered increases; consequently the aver- 
age recurrent excitation received by each cell increases, and 
therefore the probability that one or two spikes will trigger a 
cascade to N spikes. In Figure lOB we demonstrate that the 
impact of this effect may be captured by plotting Dxl{P, P) as 
a function of an effective coupling parameter, g*N/3. Here, we 
plot the curves for six population sizes (N = 3, 4, 6, 8, 10, and 
12) and five common input types; each curve was scaled by 
normalizing Dkl(P, P) by its maximum value. For many sets 
of parameter values, the resulting curves line up remarkably 
well, suggesting a universal scaling with the effective coupling 
parameter. 

We also explored the overall possible impact of recurrence on 
higher-order interactions, by surveying a range of circuit param- 
eters c, a, & and g. The top panel of Figure IOC shows the 
maximal Dkl(P, P) per neuron, for each type of input, up to 
population size N = &. For unimodal inputs, recurrent coupling 
increased the available range of higher-order interactions mod- 
estly, compared with the range achieved with purely feedforward 
connections; however, these values remained significantly lower 
than those achieved for bimodal inputs. 

Finally, we considered how higher-order interactions scale 
with population sampling size. The spike pattern distributions 
used to generate the last column of data points {N = 8) in the 
top panel of Figure IOC were reanalyzed by sub-sampling the 
spike pattern distributions on < 8 cells. In each case, we chose 
our sub-population to be k nearest neighbors (for our setup, any 
subset of k cells is statistically identical). In the bottom panel of 
Figure IOC, we show the maximal value of Dkl(P, P) per sub- 
sampled cell achieved over all input parameters (the curves for 
Gaussian, skewed and Cauchy inputs are so close together so as to 
be visually indistinguishable). This number increases or remains 
steady as k increases, indicating that sub-sampling a coupled net- 
work will depress the apparent higher-order interactions in the 
output spiking pattern. 

To summarize, the greater impact of bimodal vs. unimodal 
input statistics on maximal values of D^i(P, P) persists in cir- 
cuits with N = 3 cells up to N = 16 cells. Overall, for the cir- 
cuit parameters producing maximal deviations from PME fits, 
it becomes easier to statistically distinguish between spiking dis- 
tributions and their PME fits as the number of cells increases in 
feedforward networks. 
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FIGURE 9 I The significance of hiigher-order interactions increases 
with network size. (A) Normalized maximal deviation, Dkl(P, from 
the PME fit for the thresholding circuit model as network size W 
increases. For each W and common input distribution type, possible input 
parameters were in the following ranges: input correlation ce [0. 1], input 
standard deciation oe[0, 4J, and threshold 0e[O, 3J. (B) Example 



sample distributions for different types of common input: from top, 
bimodal, Gaussian, heavy-tailed skew, and Cauchy common inputs. For 
each input type, the distribution that maximized Dki(P, P) for W= 16 is 
shown. Each distribution is illustrated with a bar plot contrasting the 
probabilities of spiking events in the true (dark blue) vs. pairwise 
maximum entropy (light pink) distributions. 




FIGURE 10 I The impact of recurrent coupling on the sum-and-threshold 
model, for increasing population size. (A) Dkl(P, P) as a function of the 
coupling coefficient, g, for a specific value of population size W. In all plots, 
input standard deviation cr = 1 , threshold 9 = 1.5 and input correlation 
c = 0.2. From top: W = 4; W = 8; W = 12. (B) D^i;"'{P, P) as a function of the 
coupling coefficient, g, for populations sizes N = 3 — 12. For each curve, 
Dkl(P, P) was scaled by its maximal value and plotted as a function of the 
scaled coupling coefficient, g*W/3, to illustrate a universal scaling with 
effective coupling strength. The line style of each curve indicates the 
population size W, as listed in the legend. The marker and line color indicate 



the common input marginal, as listed in the legend for (A). (C) (Top) Maximal 
value of Dkl(P. P)/W, achieved over a survey of parameter values c, a, ®, 
and g, as a function of the population size N (solid lines). For each input 
marginal type, a second curve shows the maximal value obtained over only 
feed-forward simulations (g = 0; dashed lines). The marker and line color 
indicate the common input marginal, as listed in the legend for (A). (Bottom) 
Maximal value of Dkl(P, P)/k, achieved over a survey of parameter values c, 
o, 0, and g, as a function of the subsample population size k. Data was 
subsampled from the W = 8 data shown in the top panel, by restricting 
analysis to k out of N cells. 
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3. DISCUSSION 

We used mechanistic models to identify input patterns and cir- 
cuit mechanisms which produce spike patterns with significant 
higher-order interactions — that is, with substantial deviations 
from predictions under a PME model. We focused on a tractable 
setting of small, symmetric circuits with common inputs. This 
revealed several general principles. First, we found that these 
circuits produced outputs that were much closer to PME predic- 
tions than required for a general spiking pattern. Second, bimodal 
input distributions produced stronger higher-order interactions 
than unimodal distributions. Third, recurrent excitatory or gap 
junction coupling could produce a further, moderate increase of 
higher-order correlations; the effect was greatest for coupling of 
intermediate strength. 

These general results held for both an abstract threshold- 
and-spike model and for networks of non-linear integrate-and- 
fire units based on measured properties of one class of RGCs. 
Together with the facts that ON parasol cell filtering suppresses 
bimodality in light input, and that coupling among ON parasol 
cells is relatively weak, our findings provide an explanation for 
why their population activity is well captured by PME models. 

3.1. COMPARISON WITH EMPIRICAL STUDIES 

How do our maximum entropy fits compare with empirical stud- 
ies? In terms of D-kl(P, P) — equivalently, the logarithm of the 
average relative likelihood that a sequence of data drawn from P 
was instead drawn from the model P — numbers obtained from 
our RGC models are very similar to those obtained by in vitro 
experiments on primate RGCs (Shlens et al, 2006, 2009). For 
example, in a survey of 20 numerical experiments under con- 
stant light conditions (each of length 100 ms, with spikes binned 
in 10 ms intervals), we find that D^{P, P) ranges between 0 and 
0.00029: similarly excellent fits were found by Shlens et al. (2006) 
(in which cell triplets were stimulated by constant light for 60 s 
with spikes binned at 10 ms), with one example given of 0.0008 
(inferred from a reported likelihood ratio of 0.99944). These 
values can increase by an order of magnitude under full-field 
stimulation, as well as spatio-temporally varying stixel simula- 
tions (bounded above by 0.007). We can view the 60 [im stixel 
simulations as a model of the checkerboard experiments of Shlens 
et al. (2006), for which close fits by the PME distribution were also 
observed (likelihood numbers were not reported). Similarly, the 
values of A produced by our RGC model are close to those found 
by Schneidman et al. (2006); Shlens et al. (2006) under compa- 
rable stimulus conditions. We obtain A = 99.5% (for cell group 
size N = 3) under constant illumination, which is near the range 
reported by Shlens et al. (2006) for the same bin size and stimulus 
conditions (98.6 ± 0.5, AT = 3 — 7). For fiiU-field stimuK we find 
a range of numbers from 95.7% to 99.3% {N = 3). 

With regard to the circuit mechanisms behind these excellent 
fits by pairwise models, the findings that most directly address 
the experimental settings of Shlens et al. (2006, 2009), are (1) 
the finding that in the threshold model, unimodal inputs generate 
minimal higher-order interactions, compared to bimodal inputs, 
and (2) the particular stimulus filtering properties of parasol cells 
can suppress bimodality that may be present in an input stimu- 
lus, resulting in a unimodal distribution of input currents. First, 



we believe that unimodal inputs are consistent with the white- 
noise checkboard stimuli used in Shlens et al. (2006, 2009), where 
binary pixels were chosen to be small relative to the receptive 
field size; averaged over the spatial receptive field, they would 
be expected to yield a single Gaussian input by the central limit 
theorem. Second, temporal filtering may contribute to receipt of 
unimodal conductance inputs by cells for the full-field binary 
flicker stimuli that are delivered in Schneidman et al. (2006). With 
the 16.7 ms refresh rate used there, under the assumption that the 
filter time-scale of the cells studied in that paper is roughly similar 
to that of the ON parasol cell we consider, the filter would aver- 
age a binary (and hence bimodal) stimulus into a unimodal shape 
(see Figure 2C, for example). 

The simple threshold models that we have considered, mean- 
while, give us a roadmap for how circuits could be driven in 
such a way as to lower A. The right columns of Figures 7B,D,F 
show A plotted as a function of firing rate for circuits of N = 3 
cells receiving global common inputs; we observe that A ~ 1 for 
Gaussian inputs over a broad range of firing rates and pairwise 
correlation coefficients, but that values of A can be depressed 
by 25-50% in the presence of a bimodal common input. Indeed, 
Shlens et al. (2006) showed that adding global bimodal inputs to 
a purely pairwise model can lead to a comparable departure in A. 
Our results are consistent with this finding, and explicitly demon- 
strate that the bimodality of the inputs — as well as their global 
projection — are characteristics that lead to this departure. 

3.2. CONSEQUENCES FOR SPECIFIC NEURAL CIRCUITS 

Our results make predictions about when neural circuits are likely 
to generate higher-order interactions. A comprehensive study of 
our simple thresholding model shows that bimodal inputs gen- 
erate greater beyond-pairwise interactions than unimodal inputs. 
This result can be extended to other circuits where a clear input- 
output relationship exists, and be used to predict higher-order 
correlations by analyzing the impact of stimulus filtering on a 
statistically defined class of inputs. For example, the effect holds 
in our model of primate ON parasol cells, where a biphasic fil- 
ter suppresses bimodality in a stimulus with a timescale matched 
to that of the filter. We can use these results to extrapolate to 
other classes of RGCs or other stimulus conditions in which fil- 
ters are less biphasic (Victor, 1999). Indeed, when we process long 
time-scale bimodal inputs through a preliminary model of the 
midget cell circuit, stimulus bimodality is no longer suppressed 
and is associated with higher-order interactions (see Figure 4). 
We predict that greater higher-order interactions will be found 
for stimuli or RGC circuits that elicit bimodal activity that is 
thresholded when generating spikes — in comparison to the para- 
sol circuits and stimuli studied in Shlens et al. (2006, 2009). 
We believe that this principle wiU be further applicable in other 
sensory systems. 

We found that recurrent excitatory connections further 
increase higher-order interactions, which are maximized at an 
intermediate recurrence strength; in particular, when the strength 
of an excitatory recurrent input was comparable to the distance 
between rest and threshold (Figure 8). For the primate ON para- 
sol cells we considered, the experimentally measured strength of 
gap junction coupling would lead to an estimated membrane 



Frontiers in Computational Neuroscience 



www.frontiersin.org 



February 2014 | Volume 8 | Article 10 | 15 



Barreiro et al. 



Beyond-pairwise correlations in microcircuits 



voltage jump of ~1 mV in response to the firing of a neighbor- 
ing RGC, whUe the voltage distance between the resting voltage 
and an approximate threshold is about 5-10 mV (Trong and 
Rieke, 2008). Consistent with this estimate, we found that in 
our ON parasol cell model, higher-order interactions were maxi- 
mized when the strength of excitatory recurrence was eight times 
its experimentally measured value. The experimentally measured 
values of recurrence had little or no effect on higher-order inter- 
actions. We anticipate that this result may be used to predict 
whether recurrent coupling plays a role in generating higher- 
order interactions in other circuits where the average voltage 
jump produced by an electrical or synaptic connection can be 
measured. 

To apply our findings to real circuits, we must also consider 
population size. A measurement from a neural circuit, in most 
cases, will be a subsample of a much larger, complete circuit. 
We addressed this question where it was computationally more 
tractable, for the thresholding model. Here, we found that the 
impact of higher-order interactions, as measured by entropy per 
cell unaccounted for by the pairwise model (Din^/k), increases 
moderately as subsample size k increases. Since recurrent con- 
nectivity in our model is truly global, this is consistent with the 
suggestion of Roudi et al. (2009a) and others that the entropy 
can be expected to scale extensively with population size N, once 
N significantly exceeds the true spatial connectivity footprint: we 
may see different results with limited, local connectivity. 

3.3. SCOPE AND OPEN QUESTIONS 

There are many aspects of circuits left unexplored by our study. 
Prominent among these is heterogeneity. Only a few of our sim- 
ulations produce heterogeneous inputs to model RGCs, and all 
of our studies apply to cells with identical response properties. 
This is in contrast to studies such as Schneidman et al. (2006), 
which examine correlation structures among multiple cell types. 
For larger networks, feedforward connections with variable spa- 
tial profiles also occur, between the extremes of independent and 
global input connections examined here. It is also possible that 
more complex input statistics could lead to greater higher-order 
interactions (Bethge and Berens, 2008). Finally, Figure 9 indicates 
that some trends in DklC^, P) vs. N appear to become non-linear 
for N ^ 10; for larger networks, our qualitative findings could 
change. 

Our study also leaves largely open the role of different reti- 
nal filters in generating higher-order interactions. We have found 
that the specific filtering properties of ON parasol cells sup- 
press bimodality in light inputs, suggesting that other classes of 
RGCs, such as midget cells, may produce more robust higher- 
order interactions (compare panels in Figure 4B). This predicts 
a specific mechanism for the development of higher-order inter- 
actions in preparations that include multiple classes of ganglion 
cells (Schneidman et al., 2006). For a complete picture, future 
studies will also need to account for the possible adaptation of 
stimulus filters in response to higher-order stimulus character- 
istics (Tkacik et al., 2012); we did not consider the latter effect 
here, where our filter was fit to the response of a cell to Gaussian 
stimuli with specific mean and variance. An allied possibility is 
that multiple filters will be required, as was found when fitting 



the responses of salamander retinal cells to LN models (Fairhall 
et al., 2006). Distinguishing the roles of linear filters vs. static 
non-linearities in determining which stimulus classes will give 
the greatest higher-order correlations is another important step. 
Finally, we considered circuits with a single step of inputs and 
simple excitatory or gap junction coupling; a plethora of other 
network features could also lead to higher-order interactions, 
including multi-layer feedforward structures, together with lat- 
eral and feedback coupling. We speculate that, in particular, 
such mechanisms could contribute to the higher-order interac- 
tions found in cortex (Tang et al, 2008; Montani et al., 2009; 
Ohiorhenuan et al, 2010; Oizumi et al, 2010; Koster et al., 2013). 

A final outstanding area of research is to link tractable net- 
work mechanisms for higher-order interactions with their impact 
(or lack of impact) on information encoded in neural popula- 
tions (Kuhn et al, 2003; Montani et al, 2009; Oizumi et al, 
2010; Ganmor et al, 2011; Cain and Shea-Brown, 2013). A sim- 
ple starting point is to consider rate-based population codes in 
which each stimulus produces a different "tuned" average spike 
count (see for e.g., chapter 3 of Dayan and Abbot, 2001). One 
can then ask whether spike responses can be more easily decoded 
to estimate stimuli for the full population response (i.e., P) to 
each stimulus or for its pairwise approximation (P). In our pre- 
liminary tests where higher-order correlations were created by 
inputs with bimodal distributions, we found examples where 
decoding of P vs. P differed substantially. However, a more com- 
plete study would be required before general conclusions about 
trends and magnitudes of the effect could be made; such a study 
would include complementary approach in which the fuU spike 
responses P are themselves decoded via a "mismatched" decoder 
based on the pairwise model P (Oizumi et al., 2010). Overall, we 
hope that the present paper, as one of the first that connects cir- 
cuit mechanisms to higher-order statistics of spike patterns, will 
contribute to future research that takes these next steps. 

4. MATERIALS AND METHODS 

4.1. EXPERIMENTALLY-BASED MODEL OF A RGC CIRCUIT 

We model the response of a individual RGC using data col- 
lected from a representative primate ON parasol cell, following 
methods in Murphy and Rieke (2006); Trong and Rieke (2008). 
Similar response properties were observed in recordings from 
16 other cells. To measure the relationship between light stim- 
uli and synaptic conductances, the retina was exposed to a 
full-field, white noise stimulus. The cell was voltage clamped 
at the excitatory (or inhibitory) reversal potential Ve = 0 mV 
( V; = —60 mV), and the inhibitory (or excitatory) currents were 
measured in response to the stimulus. These currents were then 
turned into equivalent conductances by dividing by the driving 
force of ±60 mV; in other words 

^exc ^ jtXC^^y _ y^y^ V - Ve = "60 mV 

^inh ^ pnh^^y _ y^y V - V7 = 60 mV 

The time-dependent conductances and ^'"'^ were now 
injected into a different cell using a dynamic clamp procedure 
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(i.e., the input current was varied rapidly to maintain the cor- 
rect relationship between the conductance and the membrane 
voltage) and the voltage was measured at a resolution of 0.1 ms. 

4. 1. 1. Stimulus filtering 

To model the relationship between the light stimulus and synaptic 
conductances, the current measurements /™ and 1'"*^ were fit to 
a linear-nonlinear model: 

/''^(f) = N'='"^ [I™ * s(f) -I- Ti<=''^] , 

where s is the stimulus, L™*^ (L'"*^) is a linear filter, N'^^'^ (iV'"^) is 
a non-linear fiinction, and t]™ (r)'"'') is a noise term. The linear 
filter was fit by the fianction 

^'''(f) = Pexc (f/rexc)"=''= exp (-f/texc) sin (2jtf/rexc) (7) 

and the non-linear filter by the polynomial 

N'^'^^CX) = AexcX^ + Bexc^ + Cexc- (8) 

Fits minimized the mean-square distance between model and 
data. L'"'^ and N'"'' were fit using the same parametrization. 

The noise terms ti^'"^, ti^'' were fit to reproduce the statistical 
characteristics of the residuals from this fitting. We simulated the 
noise terms ti™ and r)'"'' using Ornstein-Uhlenbeck processes 
with the appropriate parameters; these were entirely characterized 
by the mean, standard deviation, and time constant of autocorre- 
lation TT],exc (tTi.inh)) as well as pairwise correlation coefficients 
for noise terms entering neighboring cells. The noise correlation 
coefficients were estimated from the dual recordings of Trong and 
Rieke (2008). 

Linear filter parameters computed (also listed in Table 1) 

were Pexc = —8 X 10''s^^ nexc = 3.6, texc = 12ms, Texc = 

105ms, and Pi^h = —1.8 x 10^ s^', Mjuh = 3.0, xi^h = 16ms, 
Tinh = 120 ms. Non-linearity parameters were Aexc = —8.3 x 
10"^ nS, 5exc = 7 X 10"^ nS, Cexc = -0.95 nS, and Ainh = 
1.67 X lO^'^nS, Binh = 6.2 x lO^^nS, Cinh = 4.17 nS. Noise 
parameters were measured to be mean(Ti^'''^) = 30, std(Ti^'"^) = 
500, t,i,exc = 22 ms, and mean(Ti["'") = -1200, std(Ti^'') = 780, 
injj = 33 ms. In addition, excitatory (inhibitory) noise to dif- 
ferent cells T)™, Ti™ (ti["*^, Tlj"*^) had a correlation coefficient of 
0.3 (0.15). 

For the filter demonstrated in Figure 4, we added a cosine 
component to the previous filter, i.e., 

j-exc,M(j) ^ p^^^_^ (f/texcM)""'-" exp (-t/Texc,M) 

X [sin (2jTt/rexc,M,s) + PexcM COS (2jTt/rexc,M,c)](9) 

Here Pexc.M = —3.2 X 10^ S^^ «exc,M = 2, texc.M = 12 ms, 
rexc,M,S = 120 ms and Texc.M.C = 100 ms, and Pjnh.M = 
-3.5 X 10^s~^ ninh,M = 2, tinh.M = 13.2ms, Pinh.M.S = 132ms 
and Tinh,M,C = HOms, while Pexc.M = Pinh.M = 0.8. 



4. 1.2. Voltage evolution 

We create a model of the cell as a non-linear integrate-and-fire 
model using the method of Badel et al. (2007), in which the 
membrane voltage is assumed to respond as 

— =F(V,t- tust) + V. (10) 
at C 

where C is the cell capacitance, t\.^st is the time of the last 
spike before time f, and /input (0 is a time-dependent input cur- 
rent. We use the current-clamp data, which yields cell voltage 
in response to the input current -fmput(0 = ~5'™'"(0(^~ Ve) ~ 
ginh^Y _ Y^^^ ^ function P(V, t). When voltage data is seg- 
regated according to the time since the last spike t — t\^^i, the I —V 
curve is well fit by a function of the form 

P(V, t- fiast) = — (El - V + Are'^-^^'/^^) (11) 

where parameters are the membrane time constant t^, rest- 
ing potential (£i), spike width Aj- and knee of the exponential 
curve Vj- 

The values of these constants differed in each bin of voltage data; 
to estimate these constants, we first extracted their values from each 
mean I — V curve. We found that these constants, as a function of 
t — tiast, were well fit by either a single exponential or a difference 
of two exponentials, with relaxation to a baseline rate (as in Badel 
et al., 2007, Figure 3). Specifically, we chose: 

— = c.t,„4 -I- Cx„,2e"*'"''""''^'"'"'' 

El = C£^,l + C£j.,2 (^e"('-'l"«'/C£i,3 _ g-(t-flast)/C£i,4^ 

Vt = cvr,i + cv^.ze"^'"""''^/'^!-.' (12) 

We obtained the coefficients by least-squares fitting to the above 
functional forms: specifically, we found that (up to four digits): 
(ct,„,i, Ct„,2, Cx,„,3) = (0.3719ms"^ 0.5412ms"^ 13.2726ms), 
(c£i,l, C£j.,2, C£j.,3, C£^,4) =(-59.4858mV, 5.8966mV, 8.3076ms, 
233.1114ms), (caj-.i, CAr,2> caj.,3, caj-,4) = (20.0487ms, 

19.0560ms, 3.6280ms, 2.4304s), and (cvj.,1, cvj.,2, cvr.a) = 
(-44.3323 mV, 25.1812 mV, 4.7653 ms). Coefficients are also 
listed in Table 2. 

The capacitance was inferred from the voltage trace data by 
finding, at a voltage value where the voltage/membrane current 
relationship is approximately Ohmic, the value of C that mini- 
mizes error in the relation Equation (10) (Badel et al., 2007). The 
estimated value was C = 28 pF. 

4. 1.3. Spiking dynamics: feedforward network 

For simulations without electronic coupling, our model neu- 
ron comprises Equations (10, 11) for V < Vdueshold; a spike was 
detected when V reached Vthreshold = — 30mV; voltage was then 
reset to V^eset = — 55 mV. The cell was then unable to spike for an 
absolute refractory period of Tabs = 3 ms. 

All simulations presented here were done in a three-cell net- 
work. 
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4. 1.4. Spiking dynamics: recurrent network 

Gap Junction coupling was introduced as an additional current on 
the right-hand side of Equation (10): 

¥ = -^E(^-^^) (13) 



The coupling strength ^^^P was held constant during a simulation. 
When coupling was present (i.e., when ^^^P / 0), g^S^p ^y^j var- 
ied from the measured level (1.1 nS) (Trong and Rieke, 2008) to 
16 times this value (17.6 nS) between simulations. When present, 
coupling was all-to-all. 

As in the feedforward model. Equations (10, 11) were inte- 
grated for V < Vthreshold) and a spike was detected when V reached 
^threshold = —30 mV. To model the voltage trajectory immediately 
following a spike, an averaged spike waveform was extracted from 
voltage traces of the same ON parasol cell used to fit Equations (10, 
11). This spike waveform was then used to replace 1 ms of the 
membrane voltage trajectory during and after a spike; at the end 
of the 1 ms, the voltage was released at approximately — 58mV. 
The cell was unable to spike for an absolute refractory period of 
Tabs = 3 ms. A relative refractory period was induced by introduc- 
ing a declining threshold for the period of 3-6 ms following a spike, 
after which Vthreshold returns to —30 mV. 



4. 1.5. Cell receptive field and stimulation 

We defined each cell's stimulus as the linear convolution of an 
image with its receptive field. The receptive fields include an ON 
center and an OFF surround, as in Chichilnisky and Kalmar (2002): 

Sj (3c) = exp ^- ^ (J - Xj) ^ Q (J - %) ^ (14) 

— fcexp ^"2'^ ~ ^-'•^ ~ 

where the parameters k and 1/r give the relative strength and size 
of the surround. Q specifies the shape of the center and was cho- 
sen to have a 1 standard deviation (SD) radius of 50 [im and to be 
perfectly circular. The receptive field locations xi,X2, and were 
chosen so that the 1 SD outlines of the receptive field centers will 
tile the plane (i.e., they just touch). Other parameters used were 
k = 0.3, r= 0.675. 

Stimulation images were defined on a 512 [im x 512 [xm grid 
that overlapped all three receptive fields. For full-field stimuli, 
light intensity was chosen be spatially constant and refreshed every 
8, 40, or 100 ms by choosing independently from the specified 
stimulus distribution (Gaussian, binary, Cauchy, or heavy-tailed 
skew). For spatially variable stimuli, a checkerboard pattern was 
imposed on the stimulation image: the intensity value in each 
checkerboard square was chosen independently and refreshed 



Table 1 | Parameters used to model the transformation of stimuli into synaptic conductances for the RGC model, as described in Equations 
(7-9). 



Model (MOD) 


Pmod (s ^ ) 


T^MOD ("is) 


"MOD 


Tmod ("is) 


•^MOD ("S) 


Bmod (iS) 




Cmod ("S) 


exc 


-8 X lO'* 


12 


3.6 


105 


-8.3 X 10-7 


7 X 10-3 




-0.95 


inh 


-1.8 X 10^ 


16 


3.0 


120 


1.67 X 10-6 


6.2 X 10-3 




4.17 


exc.M 


-3.2 X 10^ 


12 


2 


120* 


-8.3 X 10-7 


7 X 10-3 




-0.95 


inh.M 


-3.5 X 10^ 


13.2 


2 


132* 


1.67 X 10-6 


6.2 X 10-3 




4.17 


Additional parameters for monophasic filters 


Model (MOD) 








Tmod, s ("is) 


TtnoD, c ("is) 




"mod 




exc.M 








120 


100 




0.8 




inh.M 








132 


110 




0.8 





Asterisks (*) indicate parameters that are superceded by later rows; note that the monophasic filter equations contain two flltenng timescales — for example Texc.M.s 
and Tgxc.M.c, f°i' f's excitatory monophasic filter — and a relative weighting (e.g., Rbxc.m)- 



Table 2 | Coefficients used to define refractory EIF model as specified in Equations (11, 12). 



Parameter (PAR) 


CpAR,1 


CpAR,2 


CpAR,3 ("is) 


CPAR,4 ("is) 


Xm (actual fit: 1 /xm) 


0.3719 ms-i 


0.5412 ms-i 


13.2726 




Vt 


-44.3323 mV 


25.1812 mV 


4.7653 




El 


-59.4858 mV 


5.8966 mV 


8.3076 


233.1114 


Ar 


20.0487 ms 


19.0560 ms 


3.6280 


2430.4 



The parameters l/im and Vj were fit to single exponentials as functions of time, with three free parameters. The parameters Er and Aj were fit to differences 
of exponentials and therefore have four parameters. Units in the first and second columns are as stated: coefficients in the third and fourth column are in units of 
milliseconds (ms). 
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at the appropriate interval. The checkerboard pattern was first 
given a random rotation and translation relative to the receptive 
fields: this was chosen at the outset of each batch of stixel sim- 
ulations (for a total of five rotation/ translation pairs per stixel 
size, refresh rate, and stimulus distribution). Two example place- 
ments are shown in Figures S2A,D for 256 |xm and 60 [im pixels 
respectively. 

4.1.6. Numerical methods 

All simulations and data analysis were performed using MATLAB. 
Equations (10, 11) were integrated using the Euler method for 
>10^ms with a time step of 0.1ms. The synaptic noise terms, 
r]^'' and ri^"*^, as well as the light input, were generated indepen- 
dently for each simulation. In response to uniform light stim- 
uli, firing rates were 11.51 ±0.38 Hz (standard deviations given 
across a total of 60 cells; 3 cells each from 20 10^ ms simula- 
tions); 10 ms bins were used to discretize the spiking output. Firing 
rates were higher for full-field stimuli, ranging from 12 to 43 Hz 
(firing rates increased with stimulus variance); therefore shorter 
(5 ms) bins were used to discretize spike output for all other 
simulations. With this range of firing rates and bin size, multi- 
ple spikes were very rare (occurring in <1% of occupied bins). 
Empirical spiking distributions were computed from the binned 
spike data. 

For each stimulus condition, 20 simulations (or sub- 
simulations) were run, for a total integration time of 
> 20 X 10^ ms. These 20 sub-simulations were used to esti- 
mate standard errors in both the probability distribution over 
spiking events and Dyx(P, P)- Numbers reported in section 2 are, 
unless specified otherwise, produced by collating the data from the 
20 simulations. 

To fit a maximum entropy model P to an empirical probability 
distribution P, we used standard methods that have been explained 
elsewhere (Malouf, 2002). Briefly, we minimized the negative log- 
likelihood function: 

L(X) = -Y^Pix) log P(x,k) (15) 

X 

where 

P (x, X) = exp -kkfk (x) j ; 

Z\ is the partition function,/;;-, k = 1 , . . . , M is a set of functions or 
"features" of the spiking state, and Xisa vector of parameters, each 
of which serves as a Lagrange multiplier enforcing the constraint 
Ep[/fc]. For the pairwise (PME) model on N cells, X corresponds to 
N firing rates and N(N — l)/2 covariances, and the sum is over all 
possible spiking states of the system. For N = 3 there are six such 
parameters, and 

logP({xi, X2, Xi}, X) = XiXi + \2X2 + >^3^3 + >^-l,2^1^2 
+ 'k2,3.X2X3 + 1.1,3X1X3 - logZx. 

The function in Equation (15) is a convex function of the param- 
eters X which will be minimized precisely (and uniquely) when P 
matches the desired moments from P: e.g., Ep[xi] = E|,[xi]. Since 



P is in log-linear form, the result will be the maximum entropy 
distribution that matches the desired moments (Malouf, 2002). 
In principle any unconstrained gradient descent method may be 
used; we used an implementation of the non-linear conjugate 
gradient method. The KuUback Leibler divergence Dkl(J', P) was 
computed using the identity DrlC^', P) = S{P) - S(P), where S(P) 
is the entropy of P, i.e., S(P) = - ^^P(x) logP(x). 

4.7.7. Convergence testing 

To test our finding that the observed distributions were well- 
modeled by the PME fit, we also performed the PME analy- 
sis on each of the 20 simulations for each stimulus condition. 
While in general Dkl(P, P) can be quite sensitive to perturba- 
tions in P, the numbers remained small under this analysis. To 
confirm that our results for Dkl(P, P) are sufficienfly resolved to 
remove bias from sampling, we performed an analysis in which 
we collect the 20 simulations in subgroups of 1, 2, 4, 5, 10, and 
20, and plot the mean Dkl with estimated standard errors. As 
expected (e.g., Paninski, 2003), bias decreases as the length of sub- 
group increases and asymptotes at — or before — the fuU simulation 
length. 

To provide a cross-validation test for the significance of our 
reported Dkl(P, P) values, we divided our data into halves 
(which we denote Pi and P2, each including data from 10 sub- 
simulations) and performed the PME analysis on one half (say 
Pi) to yield a model Pj. We then computed -Dkl(P2>Pi) and 
Dkl{P2, Pi) (as in Yu et al., 2011), which we refer to the cross- 
validated and empirical likelihood, respectively. The former tests 
whether the PME fit is robust to over-fitting; the latter tests 
how well-resolved our "true" distribution is in the first place. 
Most cross-validated likelihoods fall on or near the identity line; 
most empirical likelihoods are close to zero [and importantly, 
significantly smaller than either Dkl (P. P) or Dkl(P2, Pl)> indi- 
cating that Dkl(P, P) is accurately resolved]. We conclude that 
the deviations that we observe when these conditions are met can 
not be accounted for by the differences in testing and training 
data. 

4.2. COMPUTATION OF SPIKING PATTERNS IN THE SIMPLIFIED MODEL 

As a simplified model of a neural circuit, we consider a vari- 
ant of the Dichotomized Gaussian (Amari et al., 2003; Macke 
et al., 2009, 2011), in which correlated inputs are thresholded 
to produce an output spike pattern. To be concrete, a set of N 
threshold spiking units is forced by a common input 1^ [drawn 
from a probability distribution Pciy)] and an independent input 
Ij [drawn fi-om a probability distribution Pi(y)]. To relate these 
functions to the other free parameters in the model, Pc(y) and 
Pl(y) were always chosen so that Ij and had mean 0 and 
variances (1 — c)a^ and ca^, respectively (so that c yields the 
Pearson's correlation coefficient of the input to two cells). The out- 
put of each cell Xj is determined by summing and thresholding 
these inputs: 

Xj = H{lj + Ic-&) (16) 

where H is the Heaviside function [H{x) = 1 if x > 0; H{x) = 0 
otherwise]. Conditioned on I^, the probability of each spike is 
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given by: 



Prob [xj = l\Ic = a] = Prob [ij + a - & > O] 
= Prob [Ij > 0 - fl] 



/ ' 



P,(y) dy 



Similarly, we have the conditioned probability that Xj = 0: 

Prob [xj = 0\ I, = a] = Prob [ij + a - & < O] 
= Prob [ij < © - fl] 

f0-o 

P,{y)dy 



I 

J—c 



Because these are conditionally independent, the probability of any 
spiking event (xi, X2, . . . , xjv) = (Ai, A2, . . . , A^) is given by the 
integral of the product of the conditioned probabilities against the 
density of the common input. 



Prob [xi = Al , . . . , = ^n] 



f 



dyPciy) 



(17) 



]~[Prob[xj = Aj\k = y] 
j=i 

The integral in Equation ( 17) is numerically evaluated via an adap- 
tive quadrature routine, such at Matlab's quad or integral. 

Four distinct unimodal inputs were used; two with heavy tails 
(Cauchy and heavy-tailed with skew), and two with sub-Gaussian 
tails (Gaussian and skewed). A random variable X is sub-Gaussian 
if the probability of large events can be bounded above by a scaled 
Gaussian; that is, if there exist constants C, c > 0 such that 

P(\X\ > X) < Cexp(-cX^) 

for all X (e.g., see Tao, 2012, p. 15). 

Unimodal inputs Ij, Ic were chosen from different marginals 
with mean 0 and variances (1 — c)a^, ca^, respectively (for sim- 
plicity, we use CT^ to refer to the variance of a generic probability 
distribution in the following three paragraphs). For Gaussian 
inputs with variance a^, P(x) oc e^^ ; for skewed inputs, 
P(x) oc (jL)e^'^+'^' Z^", for x > — (x, where the parameter a 
sets the variance 2fl(l — j) and shifting by [x = ensures that 
the mean of P(x) is zero. 

The heavy-tailed unimodal inputs were chosen so that the rate 
of tail decay would mimic the luminance statistics found in 
natural scenes (Ruderman and Bialek, 1994): 



P(x) oc 



P(x) oc 



x^ + 1' 



(x2 + 1) 



3/2 ' 



-X <x<X 



0<x<X 



for the Cauchy and heavy-tailed with skew distributions, 
respectively. A finite support of X was necessary in order to ensure 
the distributions had finite moments; X was chosen to be 1000. 
Given X, the distributions were shifted and scaled to ensure mean 
0 and variance a^. 

Bimodal inputs with variance were chosen in the following 
way: in all cases, P(x) was chosen to be a discrete distribution with 
support on two values {0, Xj i.e., P{X) = p and P(0) = 1 — p. If 
possible (i.e., if ct^ < I /4), X was chosen to be 1; otherwise, X was 
chosen so as to minimize the distance between 0 and X. Finally, 
P(x) was shifted to have the desired mean value. 
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Figure SI | Biphasic vs. monophasic filters used in simulations illustrated 
in Figure 4. 

Figure S2 | Illustration of RGC simulations with light stimuli of varying 
spatial scale ("stixels"). (A-C) For stixel size 60 |i,m, results for one 
randomly chosen stimulus position. (A) Contour lines of the three 
receptive fields (at 0.5, 1, 1.5, and 2 SD; and at the zero contour line) 
superimposed on the stimulus checkerboard (for illustration, pictured in an 
alternating black/white pattern). The red scale bar indicates 100 |xm. (B) 
Histograms of the excitatory conductances, for each cell. (C) Spike pattern 
distribution, as obtained from computational simulations of the RGC 
model ("Observed"; dark blue), and the corresponding pairwise fit 
("PME"; light pink). All eight spike patterns are shown, to allow for the 
possibility of non-symmetric responses; the three different probabilities 
labeled pi correspond to P[(1 , 0, 0)], P[(0. 1 , 0)], and P[(0, 0.1)]. (D-F) As 
in (A-C), but for stixel size 256 \Lm. Panels (E,F) demonstrate that for this 
input, both excitatory inputs and spiking responses are heterogenous 
across the RGCs. 

Figure S3 | Strength of higher-order interactions produced by the 
threshold model as input parameters vary; relationship with other output 
firing statistics. (A) For skewed common inputs: Dkl(P, P) as a function 
of input correlation c and input standard deviation a, for a fixed threshold 
0 = 1.5. Color indicates Dfj^{P. P); see color bar for range. (B) For 
skewed common inputs: Dkl(P, P) vs. firing rate E[xi] (Left) and the 
fraction of multi-information (A) captured by the PME model vs. firing rate 
E[xi] (Right). In (B), possible input parameters were varied over a broad 
range as described in section 2. Firing rate is defined as the probability of 
a spike occurring per cell per random draw of the sum-and-threshold 
model, as defined in Equation (16). Color indicates output correlation 
coefficient p ranging from black for p e (0, 0.1 ), to white for p e (0.9, 1 ), as 
illustrated in the color bars. {C,D): as in (A,B), but for heavy-tailed, skewed 
common inputs. 
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Figure S4 | The range of higher-order interactions produced by the 
threshold model varies across input type. Here, all values of Dkl(P, P) 
produced by the three-cell threshold model (previously displayed in 
Figures 7, S3) are superimposed to show the contrast between different 
input distributions. By comparing these data with data from direct 
sampling of all symmetric spiking distributions on three cells (from 
Figure 1 and shown here in yellow), one can see that only a limited set of 
output patterns are accessed by the feedforward thresholding model. 
Firing rate is defined as the probability of a spike occurring per cell per 
random draw of the sum-and-threshold model, as defined in 
Equation (16). 
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APPENDIX 

A.1 A MEASURE OF HIGHER-ORDER INTERACTIONS: Dkl^P, P) 

We begin by observing that when P is a maximum entropy distri- 
bution that approximates P (that is, it is log-linear, with coefficients 
chosen to enforce equality of a set of moments), then the KL- 
distance may be written as a difference of entropies (Cover and 
Thomas, 1991; Malouf, 2002): 

Okl(P,-P) = -S(P) + S{P) 
Here, the entropy of a probability distribution P on {0, 1)^ is given 

S{P) = -po log (po) - 3pi log (pi) - 3p2 log (p2) (18) 
-pi log (Pb) 



A.2 A MEASURE OF HIGHER-ORDER INTERACTIONS: STRAIN 

We define the strain, 

y = iogH) (19) 
\P0P2 1 

= logp3 - logpo + 31ogpi - 31ogp2 

a potential measure of the importance of higher-order interac- 
tions (Ohiorhenuan and Victor, 2010). By Equation (3), we can 
see that y = 0 precisely for a pairwise maximum entropy (PME) 
distribution. We will show that as the distribution (po, pi, p2, ps) 
is moved away from the constraint surface while fixing lower-order 
moments, the strain increases monotonically. 
From the definition of lower-order moments, 



if we use the fact that the distributions are permutation-symmetric 
[i.e., pi = P(l, 0, 0) = P(0, 1,0) = P(0, 0, 1)]. We take the loga- 
rithms in the definitions of the entropy S and KL-divergence Dkl 
to be base 2, so that any numerical values of these quantities are in 
units of bits. Using the fact that P must normalize to 1, we rewrite 

S(P) = - (1 - 3pi - ip2 - pi) log (1 - 3pi - 3p2 - pi) 
-3pi log (pi) - 3p2 log (P2) - pi log (p3) 



[I = EK] =pi+ 2p2 + p3 

p = E[XiXj] =p2+p3 

we can verify that in order to keep [l, p constant, if pi increases 
by z (i.e., pi ^ pi + z), then we must also have p2 ^ p2 ~ 2 and 
pi ^ pi + z. Then if each probability is strictly positive, then the 
derivative 



where the set of admissible distributions may now be described by 
the convex tetrahedron in M? , C = {pi,p2,p3 > 0; 3pi + 3p2 + 
Pi < 1) 

We note that the set of distributions which satisfies a desired set 
of lower order moments is given by an affine subspace (in R^, a 
line) which intersects this tetrahedron: 

|X = E[Xi] =pi+ 2p2 + p3 
p = E[Xj] =p2+p3 

Denoting this set by C^^ j;, we note that C^j^^ is a convex set and that 
S(P) is constant on each C^j_ f,. 

By straightforward differentiation we can check that the Hessian 
of — S(P) is positive definite, as long as the probabilities po,pi, etc. 
are strictly greater than zero: 



-D^S(P) 



^00 
0 f 0 

P3. 



■9 9 3' 
9 9 3 
3 3 1 



Therefore — S(P) is convex on j;; since S(P) is constant, 
Dkl(P, P) is likewise convex on C^j^ p. As a consequence, if 
Dkl(P, P) has a local minimum, then it is unique and a global 
minimum as well. Since Dkl(P, P) > 0 with equality if and only 
if P = P, this minimum must be achieved occurs when P = P; the 
maximum is likewise achieved on the boundary of the admissible 
region C^^f,. 



9y 1 

— = h — 

dz pj, + z 1 



1 



- p3 - 3pi -3p2-Z pl+Z p2 



is strictly positive as well. In particular, it is strictly positive at z = 0 
and will remain positive until z reaches a value such that one of the 
denominators reaches 0. Therefore y increases monotonically for 
z > 0 and decreases monotonically for z < 0. 

A.3 AN ANALYTICAL EXPLANATION FOR UNIMODAL vs. BIMODAL 
EFFECTS 

We consider an analytical argument to support the numeri- 
cal results that bimodal inputs generate larger deviations from 
PME model fits than unimodal inputs. As a metric, we consider 
Dkl(P, P) — where P and P are again the true and model distri- 
butions, respectively — when we perturb an independent spiking 
distribution by adding a common, global input of variance c. To 
simplify notation, the small parameter in the calculation will be 
denoted e = a/c. 

We now compute S(P) and S(P) (defined in an earlier 
Appendix) by deriving a series expansion for each set of event 
probabilities. We can compute the true distribution P using the 
expressions derived in Equation (18); to recap, let the common 
input 7c have probability density ^(7^), and the independent input 
to each cell, x, have density p5(x). Let 0 be the threshold for gener- 
ating a spike (i.e., a "1" response). For each cell, a spike is generated 
if X -|- 7^ > 0, i.e., with probability 



dUc) 



POO 



ps(x)dx. 
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Given Ic, this is conditionally independent for each cell. We can 
therefore write our probabilities by integrating over 7^ as follows: 

/oo 
pilc)il - d(k)f die 
-oo 

/oo 
piIc)d{IcKl - diIc))Ulc (20) 
-00 

/oo 
pilc)d{lcfil-dilc})dlc 
-00 

/oo 
piiMfdic 
-00 

We develop a perturbation argument in the limit of very weak com- 
mon input. That is, p(Ic) is close to a delta function centered at 
Ic = 0. Takep(7c) to be a scaled function 

p(« = J/(j) (21) 

We place no constraints on f{x), other than that it must be 
normalized (E[l] = 1) and that its moments must be finite 
(so that E[/c], E[/^], and so forth will exist, where E[g(x)] = 
f-ooS(^)f(^)dx). 

For the moment, assume that the function f{x) has a single 
maximum at x = 0. To evaluate the integrals above, we Taylor- 
expand d(x) around x = 0. Anticipating a sixth-order term to 
survive, we keep all terms up to this order. This gives, for 
small X, 

6 

d(x) « d(0) + '^kx^ + 
jt=i 

where ai = ps(@) (the other coefficients aa-fle can be given sim- 
ilarly in terms of the independent input distribution at ©). 
Substituting this into the expressions forpo> etc., above, with p(Ic) 
given as in Equation (21), gives us each event as a series in «; for 
example, 

Pi = dl + {3,axdl E[x]) € + {Oa\da + Saado) E[x^]) + . . . , 

where expectations are, again, with respect to the unsealed PDF 
f{x). The entropy S{F) is now given by using these series expan- 
sions in Equation (18). 

We note that our derivation does not rely on the fact that the 
distribution of common input is peaked at = 0 in particular. For 
example, we could have a common input centered around |x. The 
common input distribution function would be of the form 

Changing e regulates the variance, but doesn't change the mean 
or the peak (assuming, without loss of generality, that the peak 
of / occurs at zero). The peak of p{Ic) now occurs at jjl, and the 



appropriate Taylor expansion of d{x) is 

6 

d{x) ^ d{\i) + h(x - [if + Oix'), 

k=l 

where the coefficients foj- now depend on the local behavior of d 
around |x. The expectations that appear in the expansion of p3, 
and so forth, are now centered moments taken around |x; the cal- 
culations are otherwise identical. In other words, the perturbation 
expansion requires the variance of the common input to be small, 
but not the mean. 

For bimodal inputs, we consider a common input with a prob- 
ability distribution of the following form: 

so that most of the probability distribution is peaked at zero, but 
there is a second peak of higher order (here taken at 7c = 1, with- 
out loss of generality). Again, we approximate the integrals given in 
Equation (20), and therefore the entropy S(P), by Taylor expanding 
dix); 

6 

d(x) s» rf(0) + J2 '^kx'' + Oix'y, (x 0) 

k=l 
6 

^d(l) + J2h(x- l)'' + 0{ix-iy); ix^l) 

k=l 

around the two peaks 0 and 1, respectively. For each integral we 
have the same contributions from the unimodal case, multiplied 
by (1 — €^), as well as the corresponding contributions from the 
second peak multiplied by (these weightings are chosen so that 
the common input has variance of order e^, as in the unimodal 
case). This makes clear at what order every term enters. 
We now construct an expansion for the PME model P: 

1 

P (Xi, X2, X3) = 0^1 (Xl +X2+ X3) 

+X2 {X1X2 + X2Xi + X1X3)) 

We approach this problem by describing Xi and X2 as a series in 6. 
We match coefficients by forcing the first and second moments of 
P to match those of P — as they must. Specifically, take 

6 
k=\ 

k=l 

where Xi = X, X2 = 0 are the corresponding parameters from the 
independent case. The events po, pi, p2, and pi can be written as 
a series in e. We then require that the mean and centered second 
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moments of P match those of P; that is 

pi + 2p2 +P3=pi+ 2p2 + h 
p2+p3-(pl+ 2p2 + pif =p2+h-(J>l+ 2p2 + hf ■ 

At each order k, this yields a system of two hnear equations in 
and Vk; we solve, inductively, up to the desired order; we now have 
P, and therefore S(P), as a series in €. 

Finally, we combine the two series to find that in the unimodal 
case. 



Dkl{P,P) = S{P)-S(P) 



= € 



'a\ (2 EM^ - 3 EM E[x2] + E[x^]f 
2 (1 - do f dl 

If the first two odd moments of the distribution are zero 



(22) 



(something we can expect for "symmetric" distributions, such as a 
Gaussian), then this sixth-order term is zero as well. 
For the bimodal case 



IhL{P,P)=S{P)-SiP) 



(dl - dof 



2 (1 - dof dl 



+ 0(.^) 



This last term depends on the distance di — do, in other words, 
how much more likely the independent input is to push the cell 
over threshold when common input is "ON". We can also view this 
as depending on the ratio , which gives the fraction of previ- 
ously non-spiking cells that now spike as a result of the common 
input. 

The main point here, of course, is that Dkl(P, P) is of order 6* 
rather than e^. So, as the strength of a common binary vs. unimodeil 
input increases, spiking distributions depart fi-om the PME more 
rapidly. 
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